Code
# load packages
libraries = c('stringr','RColorBrewer','dplyr','knitr',
'rethinking','rstan','StanHeaders','runjags')
sapply(libraries, require, character.only=T)Everything, altogether, all at once: Addressing data challenges when measuring speech intelligibility through entropy scores
Bayesian analysis, speech intelligibility, bounded outcomes, clustering, measurement error, outliers, heteroscedasticity, generalized linear latent and mixed models, robust regression models.
The purpose of this walk-through is to improve the transparency and replicability of the analysis for the study Everything, altogether, all at once: Addressing data challenges when measuring speech intelligibility through entropy scores (in press.). This digital document contains all the code and materials utilized in the study. Furthermore, the walk-through meticulously follows the When-to-Worry-and-How-to-Avoid-the-Misuse-of-Bayesian-Statistics checklist (WAMBS checklist) developed by Depaoli & van de Schoot (2017). The checklist outlines the ten crucial points that need careful scrutiny when employing Bayesian inference procedures.
WAMBS checklist
Questionnaire outlining the ten crucial points that need careful scrutiny when employing Bayesian inference procedures, with the ultimate goal of enhancing the transparency and replicability of Bayesian analysis (Depaoli & van de Schoot, 2017).
# load packages
libraries = c('stringr','RColorBrewer','dplyr','knitr',
'rethinking','rstan','StanHeaders','runjags')
sapply(libraries, require, character.only=T)# load functions
main_dir = '/home/josema/Desktop/1. Work/1 research/PhD Antwerp/#thesis/paper1'
source( file.path( main_dir, 'walkthrough/code', 'user-defined-functions.R') )In this walk-through, Section 3 introduces various background topics that are relevant to the present study. These topics enable readers to progress smoothly through this research. Specifically, Section 3.1 provides a brief explanation of how Bayesian inference procedures work and their importance for this research. Section 3.2 is devoted to explaining the difference between two particular distributions, the normal and the beta-proportion distribution, and their role on modeling bounded data. Section 3.3 explains the (generalized) linear mixed models, elaborating on their role in modeling (non)normal clustered and bounded data. Section 3.4 illustrates the concept of measurement error and the role of latent variables to overcome the problems arising from it. Lastly, Section 3.5 explains the effects of the data distributional departures on the parameter estimates, and its importance for this research.
The specific analyses for this study are elaborated from section Section 4 onwards. Particularly, Section 4 elaborates on the general context, gaps and main purpose of the study. Section 5 introduces the research questions that guide this study. Section 6 explores the data and its implications. Section 7 thoroughly develops the methods to analyze the data. Section 8 provides answers to the research question at hand. Section 9 discusses the findings, limitations and future research derived from this study. Lastly, Section 10 provides the concluding thoughts for the study.
The R packages utilized in the production of this document can be divided in three groups. First, the packages utilized to generate the walk-through: RColorBrewer (Neuwirth, 2022) and quarto (Allaire, Teague, Scheidegger, Xie, & Dervieux, 2022). Second, the packages used for the handling the data: stringr (H. Wickham, 2022), dplyr (Hadley Wickham, François, Henry, Müller, & Vaughan, 2023), tidyverse (Hadley Wickham et al., 2019), and reshape2 (H. Wickham, 2007). Lastly, the packages used for the Bayesian estimation: coda (Plummer, Best, Cowles, & Vines, 2006), loo (A. Vehtari et al., 2023; A. Vehtari, Simpson, Gelman, Yao, & Gabry, 2021b), cmdstanr (Gabry & Češnovar, 2022), rstan (Stan Development Team, 2020), runjags (Denwood, 2016), and rethinking (McElreath, 2021).
Bayesian inference is an approach to statistical modeling and inference that is primarily based on the Bayes’ theorem. The procedure aims to derive appropriate inference statements about a set of parameters by revising and updating their occurrence probabilities in light of new evidence (Everitt & Skrondal, 2010). The procedure consists of defining the model assumptions in the form of a likelihood for the outcome and a set of prior distributions for the parameters of interest. Upon observing empirical data, these priors undergo updating to posterior distributions following Bayes’ rule (Jeffreys, 1998), from which the statistical inferences are derived 1. As an example, a simple linear regression model with a parameter \beta can be encoded under the Bayesian inference paradigm in the following form:
Bayesian inference
Approach to statistical modeling and inference, that aims to derive appropriate inference statements about one or a set of parameters by revising and updating their probabilities in light of new evidence (Everitt & Skrondal, 2010).
\begin{align*} P(\beta | Y, X ) &= \frac{ P( Y | \beta, X ) \cdot P( \beta ) }{ P( Y ) } \end{align*} \tag{1}
where P( Y| \beta, X ) defines the likelihood of the outcome, which represents the assumed probability distribution for the outcome Y, given the parameter \beta and covariate X. This is the distribution that describes the assumption about the underlying processes that give rise to the data (Everitt & Skrondal, 2010). P( \beta ) defines the prior distribution of the parameter \beta. A prior is a probability distribution summarizing the information about a parameter known or assumed before observing any empirical data (Everitt & Skrondal, 2010). Lastly, P( Y ) defines the probability distribution of the data, which represents the evidence of the observed empirical data. Consequently, the posterior distribution of the parameter P( \beta | Y, X ) describes the probability distribution of \beta after observing empirical data.
Likelihood
probability distribution that describes the assumption about the underlying processes that give rise to the data (Everitt & Skrondal, 2010).
Prior distribution
Probability distribution summarizing the information about a parameter known or assumed before observing any empirical data (Everitt & Skrondal, 2010).
Posterior distribution
Probability distribution summarizing the information about a parameter after observing empirical data (Everitt & Skrondal, 2010).
Before implementing Bayesian inference procedures, two important concepts related to Equation 1 need to be understood. First, the evidence of the empirical data P(Y) serves as a normalizing constant. This just says that the numerator in the equation is re-scaled by a constant obtained from calculating P(Y). Consequently, without loosing generalization, the equation can be succinctly rewritten in the following form:
\begin{align*} P(\beta | Y, X ) &\propto P( Y | \beta, X ) \cdot P( \beta ) \\ \end{align*} \tag{2}
where \propto denotes the proportional symbol. This implies that the posterior distribution of \beta is proportional (up to a constant) to the multiplication of the outcome’s likelihood and the parameter’s prior distribution. This definition makes the calculation of posterior distributions easier, by separating the parameter’s updating process from the integration of new empirical data (this will be clearly seen in the code provided in Section 3.1.3).
Second, a dataset usually has multiple observations of the outcome Y and covariates X, in the form of y_{i} and x_{i}. Therefore, by law of probabilities, and assuming independence among the observations, the likelihood of a dataset can be rewritten as the product of all individual likelihoods. Consequently, Equation 2 can also be rewritten as follows:
\begin{align*} P(\beta | Y, X ) &\propto \prod_{i=1}^{n} P( y_{i} | \beta, x_{i} ) \cdot P( \beta ) \end{align*} \tag{3}
Several methods within the Bayesian inference procedures can be utilized to estimate the posterior distribution of the parameter, and most of these fall into the category of Markov Chain Monte Carlo methods (MCMC). MCMC are methods to indirectly simulate random observations from probability distributions using stochastic processes (Everitt & Skrondal, 2010) 2. However, when the parameters of interest are not large in number, a useful pedagogical method to produce the posterior distribution is the grid approximation method. Through this method, an excellent approximation of the parameter’s posterior distribution can be achieved by considering a finite candidate list of parameter values. This method is used in Section 3.1.3 to illustrate how the Bayesian inference works 3.
Markov Chain Monte Carlo (MCMC)
Methods to indirectly simulate random observations from probability distributions using stochastic processes (Everitt & Skrondal, 2010).
Grid approximation
Method to indirectly simulate random observations from low dimensional continuous probability distributions, by considering a finite candidate list of parameter values (McElreath, 2020).
A simple Bayesian linear regression model can be written in the following form:
\begin{align*} y_{i} &= \beta \cdot x_{i} + e_{i} \\ e_{i} &\sim \text{Normal}( 0, 1 ) \\ \beta &\sim \text{Uniform}( -20, +20 ) \end{align*} where y_{i} denotes the outcome’s observation i, \beta the expected effect of the observed covariate x_{i} on the outcome, and e_{i} the outcome’s residual in observation i. Furthermore, the residuals e_{i} are assumed to follow a normal distribution with mean zero and standard deviation equal to one. Lastly, prior to observe any data, it is assumed that \beta is uniformly distributed within the range of [-20,+20]. This prior implies that equal probabilities are assigned to all values of \beta within this range.
However, a more convenient generalized manner to represent the same linear regression model is as follows:
\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Uniform}( -20, +20 ) \end{align*} In this definition, the component of the Bayesian inference procedure detailed in Section 3.1.1 are more easily spotted. First, about the likelihood, the outcome is assumed to be normally distributed with mean \mu_{i} and standard deviation equal to one. Second, it is assumed that \beta has a uniform prior within the range of [-20,+20]. Moreover, the equations reveal that the mean of the outcome \mu_{i} is modeled by a linear predictor composed of the covariate x_{i} and its effect on the outcome \beta.
For illustration purposes, a simulated regression with n=100 observations was generated assuming \beta=0.2. Figure 1 shows the scatter plot of the generated data (see code below). The grid approximation method is used to generate random observations from the posterior distribution of \beta. Two noteworthy results emerge from the approach. Firstly, once the posterior distribution is generated, various summaries can be used to make inferences about the parameter of interest (refer to the code output below). Secondly, when considering a dataset with n=100 observations, the influence of the prior on the posterior distribution of \beta is negligible. Specifically, prior to observe any data, assuming that \beta could take any value within the range of [-20,+20] with equal probability (left panel of Figure 2) did not have a substantial impact on the distribution of \beta after empirical data was observed (right panel of Figure 2).
set.seed(12345)
n = 100
b = 0.2
x = rnorm( n=n, mean=0, sd=1 )
mu_y = b*x
y = rnorm( n=n, mean=mu_y, sd=1 )# grid approximation
Ngp = 1000
b_cand = seq( from=-20, to=20, length.out=Ngp )
udf = function(i){ b_cand[i]*x }
mu_y = sapply( 1:length(b_cand), udf )
udf = function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
y_lik = sapply( 1:length(b_cand), udf )
b_prior = rep( 1/40, length(b_cand) )
b_prop = y_lik * b_prior
b_post = b_prop / sum(b_prop)paste0( 'true beta = ', b )
b_exp = sum( b_cand * b_post )
paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )
b_max = b_cand[ b_post==max(b_post) ]
paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )
b_var = sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) )
paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )
b_prob = sum( b_post[ b_cand > 0 ] )
paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )[1] "true beta = 0.2"
[1] "estimated beta (expectation) = 0.299"
[1] "estimated beta (maximum probability) = 0.3"
[1] "estimated beta (standard deviation) = 0.088"
[1] "P(estimated beta > 0) = 1"
plot( x, y, xlim=c(-3,3), ylim=c(-3,3),
pch=19, col=rgb(0,0,0,alpha=0.3) )
abline( a=0, b=b, lty=2, col='blue' )
abline( a=0, b=b_exp, lty=2, col='red' )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,3) )par(mfrow=c(1,2))
plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5),
main='Prior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )
plot( b_cand, b_post, type='l', xlim=c(-1,1),
main='Posterior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=c(0, b, b_exp), lty=2,
col=c('gray','blue','red') )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,3) )
par(mfrow=c(1,1))Prior to observing empirical data, assuming the parameter could take any value within within the range of [-20,+20] with equal probability is not the only prior assumption that can be made. Different levels of uncertainty associated with a parameter can be encoded by different priors. This concept illustrated with Figure 3 through Figure 5, where three different types of priors are used to encode three levels of uncertainty about the parameter \beta.
# grid approximation
Ngp = 1000
post = data.frame( b_cand=seq( from=-20, to=20, length.out=Ngp ) )
ud_func = function(i){ post$b_cand[i]*x }
mu_y = sapply( 1:length(post$b_cand), ud_func )
ud_func = function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
y_lik = sapply( 1:length(post$b_cand), ud_func )
post$b_prior1 = rep( 1/40, length(post$b_cand) )
post$b_prior2 = dnorm( post$b_cand, mean=0, sd=0.5 )
post$b_prior3 = dnorm( post$b_cand, mean=0.2, sd=0.05 )
nam = c()
for( i in 1:3 ){
b_prop = y_lik * post[, paste0('b_prior',i) ]
nam = c(nam, paste0('b_post',i) )
post = cbind(post, data.frame( b_prop / sum(b_prop) ) )
}
names(post)[5:7] = namFirst, the distribution depicted in Figure 3 assumes \beta \sim \text{Uniform}(-20, +20) (similar to what is observed in Section 3.1.3). The distribution does not restrain the effect of \beta to be more probable in any range within [-20, +20]. This type of distribution is commonly referred to as a non-informative prior. A non-informative prior reflects reflects the distributional commitment of a parameter to a wide range of values within a specific parameter space (Everitt & Skrondal, 2010).
Non-informative priors
Prior that reflects the distributional commitment of a parameter to a wide range of values within a specific parameter space (Everitt & Skrondal, 2010).
par(mfrow=c(1,2))
plot( post[, c('b_cand','b_prior1')], type='l',
xlim=c(-1.5,1.5), main='Prior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )
plot( post[, c( 'b_cand','b_post1')], type='l',
xlim=c(-1,1), main='Posterior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=c(0, b), lty=2, col=c('gray','blue') )
par(mfrow=c(1,1))Second, the distribution described in Figure 4 assumes \beta \sim \text{Normal}(0, 0.5). Consequently, the effect of \beta is more probable within the range [-1,+1], with less probability associated with parameter values outside this range. This is a an example of a weakly-informative prior distribution. Weakly informative priors reflect the distributional commitment of a parameter to a weakly constraint range of values within a realistic parameter space (McElreath, 2020).
Weakly informative priors
Prior that reflects the distributional commitment of a parameter to a weakly constraint range of values within a realistic parameter space (McElreath, 2020).
par(mfrow=c(1,2))
plot( post[, c('b_cand','b_prior2')], type='l',
xlim=c(-1.5,1.5), main='Prior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )
plot( post[, c( 'b_cand','b_post2')], type='l',
xlim=c(-1,1), main='Posterior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=c(0, b), lty=2, col=c('gray','blue') )
par(mfrow=c(1,1))Third, the distribution described in Figure 5 assumes \beta \sim \text{Normal}(0.2, 0.05). As a result, the effect of \beta is more probable within the range [0.1,0.3], with less probability associated with parameter values outside this range. This is an example of an informative prior distribution. Informative priors are distributions that expresses specific and definite information about a parameter (McElreath, 2020).
Informative priors
Prior distributions that that expresses specific and definite information about a parameter (McElreath, 2020).
par(mfrow=c(1,2))
plot( post[, c('b_cand','b_prior3')], type='l',
xlim=c(-1.5,1.5), main='Prior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )
plot( post[, c( 'b_cand','b_post3')], type='l',
xlim=c(-1,1), main='Posterior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=c(0, b), lty=2, col=c('gray','blue') )
par(mfrow=c(1,1))Lastly, regarding the influence of different priors on the posterior distributions, Figure 3 and Figure 4 reveals that non-informative and weakly-informative priors have a negligible influence on the posterior distribution. Both priors result in similar posteriors. Furthermore, the figure shows the data sample size n=100 is still not enough to provide an unbiased and precise estimation of the true effect. In contrast, Figure 5 shows that, informative priors can have a meaningful influence in the posterior distribution. In this particular case, the prior helps to estimate an unbiased and more precise effect. This results shows that when the data sample size is not sufficiently large, the prior assumptions can play a significant role on obtaining appropriate parameter estimates.
In cases requiring greater modeling flexibility, a more refined representation of the parameters’ priors can be defined in terms of hyperparameters and hyperpriors. Hyperparameters refer to parameters indexing a family of possible prior distributions for the original parameter, while hyperpriors are prior distributions for such hyperparameters (Everitt & Skrondal, 2010).
Hyperparameters
Parameters \theta_{2} that indexes a family of possible prior distributions for another parameter \theta_{1} (Everitt & Skrondal, 2010).
Hyperpriors
Prior distributions for hyperparameters (Everitt & Skrondal, 2010).
A simple example of the use of hyperpriors would be to define the regression model shown in Section 3.1.3 in the following form:
\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Normal}( 0, \text{exp}(v) ) \\ v &\sim \text{Normal}(0, 3) \end{align*} where v define the hyperparameter for the parameter \beta, and its associated distribution define its hyperprior.
However, setting prior distributions through hyperparameters brings its own challenges. One notable challenge pertains to the geometry of the parameter’s sample space. This implies that prior probabilistic representations defined in terms of hyperparameters sometimes exhibit complex sample geometries compared to simple priors 4. The reparametrization of hyperpriors into such simpler sample geometries leads to the notion of non-centered priors. Non-centered priors express a parameter’s prior distribution in terms of a hyperparameter, which in turn are defined by a transformation of the original parameter of interest (Gorinova et al., 2019). By incorporating non-centered priors, researchers can ensure the reliability of certain posterior distributions within Bayesian inference procedures. To illustrate, a straightforward example of a non-centered reparametrization of a prior can be demonstrated as follows:
Non-centered priors
Expression of a parameter’s distribution in terms of an hyperparameter defined by a transformation of the original parameter of interest (Gorinova et al., 2019).
\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &= z \cdot \text{exp}(v) \\ v &\sim \text{Normal}(0, 3) \\ z &\sim \text{Normal}( 0, 1 ) \end{align*} where z is a hyperparameter sampled independently from v, and the parameter of interest \beta is obtained as a transformation of the two hyperparameters. Figure 6 illustrates the differences in sampling geometries between a centered and a non-centered parametrization. It is evident that the sampling geometry depicted in the left panel of the figure is narrower than the one depicted in the right panel, and as a result, Bayesian inference procedures can have a harder time sampling from the former than the latter distributions.
n = 5000
v = rnorm( n=n, mean=0, sd=1 )
z = rnorm( n=n, mean=0, sd=1 )
b_cent = rnorm( n=n, mean=0, sd=exp(v) )
b_non = z*exp(v)par( mfrow=c(1,2) )
plot( b_cent, v, pch=19, col=rgb(0,0,0,alpha=0.1),
xlab=expression(beta), ylab=expression(v),
main='Centered parametrization' )
plot( z, v, pch=19, col=rgb(0,0,0,alpha=0.1),
xlab=expression(z), ylab=expression(v),
main='Non-centered parametrization' )
par( mfrow=c(1,1) )The selection of the Bayesian approach was based on three key properties. Firstly, empirical evidence from prior research demonstrates that Bayesian methods outperform frequentist methods, particularly in handling complex and over-parameterized models (Baker, 1998; Kim & Cohen, 1999). This superiority is evident when dealing with complex models, like the proposed GLLAMM, that are challenging to program or are not viable under frequentist methods (Depaoli, 2014).
Secondly, the approach allows for the incorporation of prior information, ensuring that certain parameters are confined within specified boundaries. This helps mitigate non-convergence or improper parameter estimation issues commonly observed in complex models under frequentist methods (Martin & McDonald, 1975; Seaman & Stamey, 2011). In this study, for example, this property was leveraged to incorporate information about the variances of random effects and constrain them to be positive.
Lastly, the Bayesian approach demonstrates proficiency in handling relatively small sample sizes (Baldwin & Fellingham, 2013; Depaoli, 2014; Lambert, Sutton, Burton, Abrams, & Jones, 2006). In this case, despite the study dealing with 2,263 entropy scores, these were derived from a modest sample size of 32 speakers, from whom the inferences are drawn. Consequently, reliance on the asymptotic properties of frequentist methods may not be warranted in this context, underscoring the pertinence of this property to the current study.
Benefits of Bayesian inference procedures
More suitable to deal with:
A normal distribution is a type of continuous probability distribution in which a random variable can take on values along the real line \left( y_{i} \in [-\infty, \infty] \right). The distribution is characterized by two independent parameters: the mean \mu and the standard deviation \sigma (Everitt & Skrondal, 2010). Thus, a random variable can take on values that are gathered around a mean \mu, with some values dispersed based on some amount of deviation \sigma, without any restriction. Importantly, by definition of the normal distribution, the location (mean) of the distribution does not influence its spread (deviation).
Figure 7 illustrates how the distribution of an outcome changes with different values of \mu and \sigma. The left panel demonstrate that the distribution of the outcome can shift in terms of its location based on the value of \mu. The right panel shows how the distribution of the outcome can become narrower or wider based on the values of \sigma. It is noteworthy that alterations in the mean \mu of the distribution have no impact on its standard deviation \sigma.
require(rethinking)
mu = c(-1.5, 0, 1.5)
sigma = c(1.5, 1, 0.5)
par(mfrow=c(1,2))
cp = sapply( 1:length(mu), col.alpha, alpha=0.7)
for(i in 1:length(mu)){
if(i==1){
curve( dnorm(x, mean=mu[i], sd=1),
from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i],
xlab="outcome values", ylab="density")
abline(v=mu, col='gray', lty=2)
legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',
legend=expression( mu[1]==-1.5,
mu[2]==0,
mu[3]==+1.5,
sigma==1) )
} else{
curve( dnorm(x, mean=mu[i], sd=1),
from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i],
xlab="", ylab="", add=T )
}
}
cp = sapply( 1:length(sigma), col.alpha, alpha=0.7)
for(i in 1:length(sigma)){
if(i==1){
curve( dnorm(x, mean=0, sd=sigma[i]),
from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i],
xlab="outcome values", ylab="density")
abline(v=0, col='gray', lty=2)
legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',
legend=expression( sigma[1]==1.5,
sigma[2]==1,
sigma[3]==0.5,
mu==0) )
} else{
curve( dnorm(x, mean=0, sd=sigma[i]),
from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i],
xlab="", ylab="", add=T )
}
}
par(mfrow=c(1,1))A beta-proportion distribution is a type of continuous probability distribution in which a random variable can assume values within the continuous interval between zero and one \left( y_{i} \in [0, 1] \right). The distribution is characterized by two parameters: the mean \mu and the sample size M (Everitt & Skrondal, 2010). This implies that a random variable can take on values restricted within the unit interval, centered around a mean \mu, with some values being more dispersed based on the sample size M. Additionally, two characteristic define the distribution. Firstly, like the random variable, the mean of the distribution can only take values within the unit interval (\mu \in [0,1]). Secondly, the mean and sample size parameters are no longer independent of each other.
Figure 8 illustrates how an outcome with a beta-proportion distribution changes with different values of \mu and M. The figure reveals two prevalent patterns in the distribution: (1) the behavior of the dispersion, as measured by the sample size, depends on the mean of the distribution, and (2) the larger the sample size, the less dispersed the distribution is within the unit interval.
require(rethinking)
mu = c(0.2, 0.5, 0.8)
M = c(2, 5, 20)
par(mfrow=c(1,2))
cp = sapply( 1:length(mu), col.alpha, alpha=0.7)
for(i in 1:length(mu)){
if(i==1){
curve( dbeta2(x, prob=mu[i], theta=10),
from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i],
xlab="outcome values", ylab="density")
abline(v=mu, col='gray', lty=2)
legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',
legend=expression( mu[1]==0.2,
mu[2]==0.5,
mu[3]==0.8,
M==10) )
} else{
curve( dbeta2(x, prob=mu[i], theta=10),
from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i],
xlab="", ylab="", add=T )
}
}
cp = sapply( 1:length(M), col.alpha, alpha=0.7)
for(i in 1:length(M)){
if(i==1){
curve( dbeta2(x, prob=0.3, theta=M[i]),
from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i],
xlab="outcome values", ylab="density")
abline(v=0.3, col='gray', lty=2)
legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',
legend=expression( M[1]==2,
M[2]==5,
M[3]==20,
mu==0.3) )
} else{
curve( dbeta2(x, prob=0.3, theta=M[i]),
from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i],
xlab="", ylab="", add=T )
}
}
par(mfrow=c(1,1))It is crucial to comprehend what signifies for an outcome to follow a normal distribution, as the assumption of normally distributed outcomes is ubiquitous in speech intelligibility research (see Boonen, Kloots, Nurzia, & Gillis, 2021; Flipsen, 2006; Lagerberg, Asberg, Hartelius, & Persson, 2014).
In contrast, the significance of the beta-proportion distribution lies in providing a suitable alternative for modeling non-normally bounded distributed outcomes, such as the entropy scores utilized in this study. Boundedness refers to the restriction of data values within specific bounds or intervals, beyond which they cannot occur (Lebl, 2022). Neglecting the bounded nature of an outcome can lead, at best, to underfitting, and, at worse, to misspecification. Underfitting occurs when statistical models fail to capture the underlying data patterns, potentially causing the generation of predictions outside the data range, hindering the model’s inability to generalize its results when confronted with new data. Conversely, misspecification, marked by a poor representation of relevant aspects of the true data in the model’s functional form or covariates inclusion, can lead to inconsistent and inefficient parameters estimates (Everitt & Skrondal, 2010).
Boundedness
Refers to the restriction of data values within specific bounds or intervals, beyond which they cannot occur (Lebl, 2022)
Underfitting
Occurs when statistical models fail to capture the underlying data patterns, potentially causing the generation of predictions outside the data range, hindering the model’s inability to generalize its results when confronted with new data (Everitt & Skrondal, 2010).
Misspecification
Occurs when the model’s functional form or inclusion of covariates poorly represent relevant aspects of the true data. This can lead to inconsistent and inefficient parameters estimates (Everitt & Skrondal, 2010).
An ordinary linear mixed model (LMM) is a procedure employed to estimate a linear relationship between the mean of a normally distributed outcome with clustered observations, and one or more covariates (Holmes, Bolin, & Kelley, 2019). A commonly know Bayesian probabilistic representation of an ordinary LMM can be expressed as follows:
Ordinary linear mixed model (LMM)
Procedure employed to estimate a linear relationship between the mean of a normally distributed outcome with clustered observations, and one or more covariates (Holmes et al., 2019).
\begin{align*} y_{ib} &= \beta x_{i} + a_{b} + \varepsilon_{ib} \\ \varepsilon_{ib} &\sim \text{Normal}(0, 1) \\ \beta &\sim \text{Normal}(0, 0.5) \\ a_{b} &\sim \text{Normal}(0, 1) \end{align*}
where y_{ib} denotes the outcome’s i’th observation clustered in block b, and x_{i} denotes the covariate for observation i. Moreover, \beta denote the fixed slope of the regression. Furthermore, a_{b} denotes the random effects, and \varepsilon_{ib} defines the random outcome residuals. Furthermore, the residuals \varepsilon_{ib} are assumed to be normally distributed with mean zero and standard deviation equal to one. Additionally, prior to observing any data, \beta is assumed to be normally distributed with mean zero and standard deviation equal to 0.5. Similarly, a_{b} is assumed to be normally distributed with mean zero and standard deviation equal to one.
A generalized linear mixed model (GLMM) are a set of models used to estimate (non)linear relationship between the mean of a (non)normally distributed outcome with clustered observations, and one or more covariates (Lee & Nelder, 1996). Interestingly, the ordinary Bayesian LMM detailed in the previous section can be represented as a special case of GLMM, as follows:
Generalized linear mixed model (GLMM)
Procedure employed to estimate (non)linear relationship between the mean of a (non)normally distributed outcome with clustered observations, and one or more covariates (Lee & Nelder, 1996).
\begin{align*} y_{ib} &\sim \text{Normal}( \mu_{ib}, 1) \\ \mu_{ib} &= \beta x_{i} + a_{b} \\ \beta &\sim \text{Normal}(0, 0.5) \\ a_{b} &\sim \text{Normal}(0, 1) \\ \end{align*}
Notice this representation explicitly highlights the three components of a GLMM: the likelihood component, the linear predictor, and the link function (McElreath, 2020). The likelihood component specifies the assumption about the distribution of an outcome, in this case a normal distribution with mean \mu_{ib} and standard deviation equal to one. The linear predictor specifies the manner in which the covariate will predict the mean of the outcome. In this case the linear predictor is a linear combination of the parameter \beta, the covariate x_{i}, and the random effects a_{b}. The link function specifies the relationship between the mean of the outcome \mu_{ib} and the linear predictor. In this case no transformation is applied to the linear predictor to match its range with the range of the outcome, as both can take on values within the real line (refer to Section 3.2.1). Lastly, resulting from the use of Bayesian procedures, a fourth component can be added to any GLMM: the prior distributions. The priors describe what is known about the parameters \beta and a_{b} before observing any empirical data.
GLMM components
On the other hand, a Beta-proportion LMM is also a GLMM, and it can be represented probabilistically as follows:
\begin{align*} y_{ib} &\sim \text{BetaProp}( \mu_{ib}, 10 ) \\ \mu_{ib} &= \text{logit}^{-1}( \beta x_{i} + a_{b} ) \\ \beta &\sim \text{Normal}(0, 0.5) \\ a_{b} &\sim \text{Normal}(0, 1) \\ \end{align*}
Notice the representation also highlights the three components of a GLMM; however, their assumptions are now slightly different. The likelihood component assumes a beta-proportion distribution for the outcome with mean \mu_{ib} and sample size equal to 10. The linear predictor is still a linear combination of the parameter \beta, the covariate x_{i}, and the random intercepts a_{b}. However, the link function now assumes the mean of the outcome is (non)linearly related to the linear predictor by a inverse-logit function: \text{logit}^{-1}(x) = exp(x) / (1+exp(x)). The inverse-logit function allows the linear predictor to match the range observed in the mean of the beta-proportion distribution \mu_{ib} \in [0,1] (refer to Section 3.2.2). Lastly, the additional fourth component resulting from using Bayesian procedures, the prior assumptions for \beta and a_{b} are also declared.
Understanding LMM is essential due to the ubiquitous assumption of normally distributed outcomes within the speech intelligibility research field (see Boonen et al., 2021; Flipsen, 2006; Lagerberg et al., 2014). Furthermore, their significance also lies in their ability to model clustered outcomes. Clustering occurs when multiple observations arise from the same individual, location, or time (McElreath, 2020). Accounting for data clustering is essential, as disregarding it may result in biased and inefficient parameter estimates. Consequently, such biases and inefficiencies can diminish statistical power or increase the likelihood of committing a type I error. Statistical power defines the model’s ability to reject the null hypothesis when it is false (Everitt & Skrondal, 2010). Type I error occurs when a null hypothesis is erroneously rejected (Everitt & Skrondal, 2010).
Clustering
Occurs when multiple observations arise from the same individual, location, or time (McElreath, 2020).
Statistical power
The model’s ability to reject the null hypothesis when it is false (Everitt & Skrondal, 2010).
Type I error
The error that results when a null hypothesis is erroneously rejected (Everitt & Skrondal, 2010).
Moreover, the significance of GLMM lies in offering the same benefits as the LMMs, in terms of parameter unbiasedness and efficiency. However, the framework also allows for the modeling of (non)linear relationships of (non)normally distributed outcomes. This is particularly important for modeling bounded data, such as the entropy scores utilized in this study. Refer to Section 3.2.3 to understand the importance of considering the bounded nature of the data in the modeling process.
Measurement error refers to the disparity between the observed values of a variable, recorded under similar conditions, and some fixed true value which is not directly observable (Everitt & Skrondal, 2010). The problem of measurement error in an outcome is easier to understand with a motivating example. Using a similar model as the one depicted in Section 3.1.3, the probabilistic representation of measurement error in the outcome can be depicted as follows:
Latent variables
It refers to the disparity between the observed values of a variable, recorded under similar conditions, and some fixed true value which is not directly observable (Everitt & Skrondal, 2010).
\begin{align*} \tilde{y}_{i} &\sim \text{Normal}( y_{i}, s ) \\ y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Uniform}( -20, 20 ) \end{align*} This representation effectively means that a manifest outcome \tilde{y}_{i} is assumed to be normally distributed with a mean equal to the latent outcome y_{i} and a measurement error s. The latent outcome y_{i} is also assumed to be normally distributed but with a mean \mu_{i} and a standard deviation of one. The mean of the latent outcome is considered to be explained by a linear combination of the covariate x_{i} and its expected effect \beta. Lastly, prior to observing any data, \beta is assumed to follow a uniform distribution within the range of [-20, +20], representing a non-informative prior.
For illustrative purposes, a simulated outcome with n=100 observations was generated, assuming \beta=0.2, and a measurement error of s=2. Figure 9 shows the scatter plot of the generated data (see code below). The left panel of the figure demonstrates that the manifest outcome has a larger spread than the latent outcome depicted in the right panel. As a result, although \beta is expected to be estimated in an unbiased manner, the statistical hypothesis tests for the parameter will likely be affected due to this larger variability.
The estimation output confirms the previous hypothesis. The posterior distribution of \beta, estimated using the manifest outcome, has a larger standard deviation than the one estimated using the appropriate latent outcome (see Figure 10 and code output below). Furthermore, the code output shows the parameter’s posterior distribution can no longer reject the null hypothesis at confidence levels of 90\% and 95\%, indicating a reduced statistical power.
set.seed(12345)
n = 100
b = 0.2
x = rnorm( n=n, mean=0, sd=1 )
mu_y = b*x
y = rnorm( n=n, mean=mu_y, sd=1 )
s = 2
y_tilde = rnorm( n=n, mean=y, sd=s )# grid approximation
Ngp = 1000
b_cand = seq( from=-20, to=20, length.out=Ngp )
udf = function(i){ b_cand[i]*x }
mu_y = sapply( 1:length(b_cand), udf )
udf = function(i){ prod( dnorm( y_tilde, mean=mu_y[,i], sd=s ) ) }
y_lik_man = sapply( 1:length(b_cand), udf )
udf = function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
y_lik_lat = sapply( 1:length(b_cand), udf )
b_prior = rep( 1/40, length(b_cand) )
b_prop_man = y_lik_man * b_prior
b_post_man = b_prop_man / sum(b_prop_man)
b_prop_lat = y_lik_lat * b_prior
b_post_lat = b_prop_lat / sum(b_prop_lat)paste0( 'true beta = ', b )
# manifest outcome
b_exp_man = sum( b_cand * b_post_man )
paste0( 'estimated beta (expectation on manifest) = ',
round(b_exp_man, 3) )
b_var_man = sqrt( sum( ( (b_cand-b_exp_man)^2 ) * b_post_man ) )
paste0( 'estimated beta (standard deviation on manifest) = ',
round(b_var_man, 3) )
# latent outcome
b_exp_lat = sum( b_cand * b_post_lat )
paste0( 'estimated beta (expectation on latent) = ',
round(b_exp_lat, 3) )
b_var_lat = sqrt( sum( ( (b_cand-b_exp_lat)^2 ) * b_post_lat ) )
paste0( 'estimated beta (standard deviation on latent) = ',
round(b_var_lat, 3) )
# null hypothsis rejection
b_prob_man = sum( b_post_man[ b_cand > 0 ] )
paste0( 'P(estimated beta on manifest > 0) = ',
round(b_prob_man, 3) )
b_prob_lat = sum( b_post_lat[ b_cand > 0 ] )
paste0( 'P(estimated beta on latent > 0) = ',
round(b_prob_lat, 3) )[1] "true beta = 0.2"
[1] "estimated beta (expectation on manifest) = 0.212"
[1] "estimated beta (standard deviation on manifest) = 0.176"
[1] "estimated beta (expectation on latent) = 0.299"
[1] "estimated beta (standard deviation on latent) = 0.088"
[1] "P(estimated beta on manifest > 0) = 0.887"
[1] "P(estimated beta on latent > 0) = 1"
par( mfrow=c(1,2) )
plot( x, y_tilde, xlim=c(-3,3), ylim=c(-7,7),
pch=19, col=rgb(0,0,0,alpha=0.3),
ylab=expression(tilde(y)),
main='manifest outcome' )
abline( a=0, b=b, lty=2, col='blue')
abline( a=0, b=b_exp_man, lty=2, col='red' )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,3) )
plot( x, y, xlim=c(-3,3), ylim=c(-7,7),
pch=19, col=rgb(0,0,0,alpha=0.3),
main='latent outcome' )
abline( a=0, b=b, lty=2, col='blue')
abline( a=0, b=b_exp_lat, lty=2, col='red' )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,3) )
par( mfrow=c(1,1) )par(mfrow=c(1,2))
plot( b_cand, b_post_man, type='l', xlim=c(-0.5,1),
main='Posterior on manifest outcome',
xlab=expression(beta), ylab='probability' )
abline( v=c(0, b, b_exp_man), lty=2, col=c('gray', 'blue', 'red') )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,3) )
plot( b_cand, b_post_lat, type='l', xlim=c(-0.5,1),
main='Posterior on latent outcome',
xlab=expression(beta), ylab='probability' )
abline( v=c(0, b, b_exp_lat), lty=2, col=c('gray', 'blue','red') )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,3) )
par(mfrow=c(1,1))Latent variables can be used to address the problem arising from the larger observed variability in one or more manifest outcomes. A latent variable is a variable that cannot be directly measured but is assumed to be primarily responsible for the variability in one or more manifest variables (Everitt & Skrondal, 2010). Latent variables can be interpreted as hypothetical constructs, traits, or true variables that account for the variability that induce dependence in one or more manifest variables (Rabe-Hesketh, Skrondal, & Pickles, 2004). This concept is akin to a linear mixed model, where the random effects serve to account for the variability that induces dependence within clustered outcomes (Rabe-Hesketh et al., 2004) (refer to Section 3.3). The most widely known examples of latent variable models include Confirmatory Factor Analysis and Structural Equation Models (CFA and SEM, respectively).
Latent variables
Variables that cannot be measured directly but are assumed to be the principal responsible for the common variability in one or more manifest variables (Everitt & Skrondal, 2010).
Commonly, latent variable models consist of two parts: a measurement part and a structural part. In the measurement part, the principles of the Thurstonian model (Luce, 1959; Thurstone, 1927) are employed to aggregate one or more manifest variables and estimate a latent variable. In the structural part, regression-like relationships among latent and other manifest variables are specified, allowing researchers to test hypotheses about their (causal) relationships (Hoyle, 2014). While the measurement part is sometimes of interest in its own right, the substantive model of interest is often defined by the structural part (Rabe-Hesketh et al., 2004).
It becomes evident that when an outcome is measured with error, the estimation procedures based on standard assumptions yield inefficient parameter estimates. This implies that the parameters are not estimated with sufficient precision. Consequently, such inefficiency can reduce statistical power and increase the likelihood of committing a type II error, which occurs when a null hypothesis is erroneously accepted (Everitt & Skrondal, 2010).
Type II error
The error that results when a null hypothesis is erroneously accepted (Everitt & Skrondal, 2010).
Therefore, the issue of measurement error in an outcome is highly relevant to this study. This research assumes that a speaker’s (latent) potential intelligibility contributes, in part, to the observed variability in the speaker’s (manifest) entropy scores. Given the interest in testing hypotheses about the potential intelligibility of speakers, and considering that the entropy scores are subject to measurement error, it becomes necessary to use latent variables to generate precise parameter estimates to test the hypothesis of interest.
In the context of regression analysis, heteroscedasticity occurs when the variance of an outcome depends on the values of another variable (Everitt & Skrondal, 2010). The opposite case is called homoscedasticity. An example of heteroscedasticity can be probabilistically represented as follows:
Heteroscedasticity
Occurs when the variance (standard deviation) of an outcome depends on the values of another variable. The opposite case is called homoscedasticity (Everitt & Skrondal, 2010).
\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, \sigma_{i} ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \sigma_{i} &= exp( \gamma \cdot x_{i} ) \\ \beta &\sim \text{Uniform}( -20, 20 ) \\ \gamma &\sim \text{Uniform}( -20, 20 ) \end{align*} This representation implies that an outcome y_{i} is assumed normally distributed with mean \mu_{i} and a standard deviation \sigma_{i}. Furthermore, the mean and standard deviation of the outcome is explained by the covariate x_{i}, through the parameters \beta and \gamma. Lastly, prior to observing any data, \beta and \gamma are assumed to be uniformly distributed in the range of [-20,+20].
Figure 11 illustrate the presence of heteroscedasticity using the previous representation, assuming a sample size of n=100, and parameters \beta=0.2 and \gamma=1. Notice the variability of the outcome increases as the covariate also increases. Consequently, it is easy to intuit that this difference in the outcome’s variability could have and impact on the statistical hypothesis tests of \beta, and even in the estimate itself. To prove the intuition, an incorrect model is used to estimate \beta.
\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Uniform}( -20, 20 ) \\ \end{align*} As a result, the hypotheses are proven accurate. When an outcome is erroneously assumed homoscedastic, the parameter estimates not only become inefficient but also are not estimated closer to the true value, as seen in the output code below and in Figure 12.
set.seed(12345)
n = 100
b = 0.2
g = 1
x = rnorm( n=n, mean=0, sd=1 )
mu_y = b*x
s_y = exp(g*x)
y = rnorm( n=n, mean=mu_y, sd=s_y )# grid approximation
Ngp = 1000
b_cand = seq( from=-20, to=20, length.out=Ngp )
udf = function(i){ b_cand[i]*x }
mu_y = sapply( 1:length(b_cand), udf )
udf = function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
y_lik = sapply( 1:length(b_cand), udf )
b_prior = rep( 1/40, length(b_cand) )
b_prop = y_lik * b_prior
b_post = b_prop / sum(b_prop)paste0( 'true beta = ', b )
b_exp = sum( b_cand * b_post )
paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )
b_max = b_cand[ b_post==max(b_post) ]
paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )
b_var = sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) )
paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )
b_prob = sum( b_post[ b_cand > 0 ] )
paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )[1] "true beta = 0.2"
[1] "estimated beta (expectation) = 0.638"
[1] "estimated beta (maximum probability) = 0.621"
[1] "estimated beta (standard deviation) = 0.088"
[1] "P(estimated beta > 0) = 1"
plot( x, y, xlim=c(-3,3), ylim=c(-6,6),
pch=19, col=rgb(0,0,0,alpha=0.3) )
abline( a=0, b=b, lty=2, col='blue')
abline( a=0, b=b_exp, lty=2, col='red' )
abline( a=-4, b=-1, lty=2, col=rgb(0,0,0,alpha=0.3))
abline( a=4.4, b=1.5, lty=2, col=rgb(0,0,0,alpha=0.3))
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,3) )par(mfrow=c(1,2))
plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5),
main='Prior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )
plot( b_cand, b_post, type='l', xlim=c(-1,1),
main='Posterior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=c(0, b, b_exp), lty=2,
col=c('gray','blue', 'red') )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,2) )
par(mfrow=c(1,1))In regression analysis, outliers are defined as observations that appear to deviate markedly from other sample data points in which they occur (Everitt & Skrondal, 2010). Although no unique probabilistic representation of outliers can be represented, a simple example can be illustrated with Figure 13. The figure depicts the presence of three influential observations in the outcome (colored blue). It is easier to intuit that with the presence of influential observations the parameter estimates, and the hypothesis test resulting from them, can be affected.
Outlier
Observation that appear to deviate markedly from other sample data points in which it occurs (Everitt & Skrondal, 2010).
The intuition is proven correct when \beta is estimated using the same incorrect model used in Section 3.5.1. When an outcome is erroneously assumed without outliers, the parameter value is estimated farther from the truth, as observed in the code output below and in Figure 14.
\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Uniform}( -20, 20 ) \\ \end{align*}
set.seed(12345)
n = 100
b = 0.2
x = rnorm( n=n, mean=0, sd=1 )
mu_y = b*x
y = rnorm( n=n, mean=mu_y, sd=1 )
idx = which( x>1 )
sel = 1:3
y[idx[sel]] = 6 # grid approximation
Ngp = 1000
b_cand = seq( from=-20, to=20, length.out=Ngp )
udf = function(i){ b_cand[i]*x }
mu_y = sapply( 1:length(b_cand), udf )
udf = function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
y_lik = sapply( 1:length(b_cand), udf )
b_prior = rep( 1/40, length(b_cand) )
b_prop = y_lik * b_prior
b_post = b_prop / sum(b_prop)paste0( 'true beta = ', b )
b_exp = sum( b_cand * b_post )
paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )
b_max = b_cand[ b_post==max(b_post) ]
paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )
b_var = sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) )
paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )
b_prob = sum( b_post[ b_cand > 0 ] )
paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )[1] "true beta = 0.2"
[1] "estimated beta (expectation) = 0.477"
[1] "estimated beta (maximum probability) = 0.46"
[1] "estimated beta (standard deviation) = 0.088"
[1] "P(estimated beta > 0) = 1"
plot( x, y, xlim=c(-3,3), ylim=c(-6,6),
pch=19, col=rgb(0,0,0,alpha=0.3) )
points( x[idx[sel]], y[idx[sel]],
pch=19, col=rgb(0,0,1,alpha=0.3) )
abline( a=0, b=b, lty=2, col='blue')
abline( a=0, b=b_exp, lty=2, col='red' )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,3) )par(mfrow=c(1,2))
plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5),
main='Prior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )
plot( b_cand, b_post, type='l', xlim=c(-1,1),
main='Posterior distribution',
xlab=expression(beta), ylab='probability' )
abline( v=c(0, b, b_exp), lty=2,
col=c('gray','blue', 'red') )
legend( 'topleft', legend=c('true', 'expected'),
bty='n', col=c('blue','red'), lty=rep(2,2) )
par(mfrow=c(1,1))As recommended by McElreath (2020), robust models can be used to deal with these types of distributional departures. Robust models are a general class of statistical procedures designed to reduce the sensitivity of the parameter estimates to mild or moderate departures of the data from the model’s assumptions (Everitt & Skrondal, 2010). The procedure consist on modifying the statistical models to include traits that effectively make them robust to small departures from the distributional assumption, like heteroscedastic errors or to the presence of outliers.
Robust models
A general class of statistical procedures designed to reduce the sensitivity of the parameter estimates to mild or moderate failures in the assumption of a model for (Everitt & Skrondal, 2010).
It is known that dealing with heteroscedasticity and the identification of outlier through preliminary univariate procedures is prone to the erroneous transformation or exclusion of valuable information. This can ultimately bias the parameter estimates, and even make them inefficient (McElreath, 2020). Bias refer to the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated (Everitt & Skrondal, 2010).
Bias
It refer to the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated (Everitt & Skrondal, 2010).
Dealing with the possibility of heteroscedasticity or outlying observations is relevant to the present study, because there is an interest in testing hypotheses about the potential intelligibility of speakers. Therefore, it is a necessity to considering the possibility of using robust regression models to assess these distributional departures and generate unbiased parameter estimates.
Intelligibility is at the core of successful, felicitous communication. Thus, being able to speak intelligibly is a major achievement in language acquisition and development. Furthermore, intelligibility is considered to be the most practical index to assess competence in oral communication (Kent, Miolo, & Bloedel, 19943). Consequently, it serves as a key indicator for evaluating the effectiveness of various interventions like speech therapy or cochlear implantation (Chin, Bergeson, & Phan, 2012). Speech intelligibility refers to the extent to which a listener can accurately recover the elements in an acoustic signal produced by a speaker, such as phonemes or words (Freeman, Pisoni, Kronenberger, & Castellanos, 2017; van Heuven, 2008; Whitehill & Chau, 2004). Studies that investigate intelligibility have utilized entropy scores to examine differences in children’s intelligibility, particularly between those with normal hearing and those with cochlear implants (Boonen et al., 2021).
Speech intelligibility
The extent to which a listener can accurately recover the elements in an acoustic signal produced by a speaker, such as phonemes or words (Freeman et al., 2017; van Heuven, 2008; Whitehill & Chau, 2004).
However, despite their potential as a fine-grained metric of intelligibility, as proposed by Boonen et al. (2021), they exhibit a statistical complexity that cautions researchers against treating them as straightforward indices of intelligibility. This complexity emerges from the processes of data collection and transcription aggregation, endowing the scores with four distinctive features: boundedness, measurement error, clustering, and the possible presence of outliers and heteroscedasticity. Firstly, entropy scores are confined to an interval between zero and one, a phenomenon known as boundedness (refer to Section 3.2). Secondly, entropy scores are a manifestation of a speaker’s intelligibility, with this intelligibility being the primary factor influencing the observed scores. This issue is commonly referred to as measurement error (refer to Section 3.4). Thirdly, due to the repeated assessment of speakers through multiple speech samples, the scores exhibit clustering (refer to Section 3.3). Lastly, driven by the specific set of speakers and speech samples under scrutiny, these scores often display a potential for the presence of outliers and heteroscedasticity (refer to Section 3.5).
Failure to collectively address these data features can result in numerous statistical challenges that might hamper the researcher’s ability to investigate intelligibility. Notably, neglecting boundedness can, at best, lead to underfitting and, at worst, to misspecification. Underfitting can cause the generation of inconsistent predictions, thus hindering the model’s ability to generalize when confronted with new data. Conversely, misspecification can lead to inconsistent and less efficient parameter estimates (refer to Section 3.2.3). Additionally, overlooking issues such as measurement error, clustering, outliers or heteroscedasticity can lead to biased and less precise parameter estimates, ultimately diminishing the statistical power of models and increasing the likelihood of committing type I or type II errors when addressing research inquiries (refer to Section 3.4.3, Section 3.3.3, and Section 3.5.4).
In the realm of computational statistics and data analysis, several models have been developed to address some of these data features individually and, at times, collectively. All of these models have found moderate adoption in various fields, including speech communication, psychology, education, health care, chemistry, and policy analysis. Specifically, in the domain of speech communication, Boonen et al. (2021) addressed data clustering within the context of intelligibility research. Conversely, de Brito Trindade et al. (2021) and Kangmennaang et al. (2023) concentrated on tackling non-normal bounded data with measurement error in covariates, within the context of chemical reactions and health care access, respectively. Remarkably, despite these individual efforts, there is, to the best of the authors’ knowledge, no study comprehensively addressing all of these data features in a principled way while also transparently and systematically documenting the Bayesian estimation of the resulting statistical models.
Considering the imperative need to comprehensively address all data features when investigating unobservable and complex traits, this investigation aims to demonstrate the efficacy of the Generalized Linear Latent and Mixed Model (GLLAMM) in handling entropy scores features when exploring research theories concerning speech intelligibility. To achieve this objective, the study will reexamine data originating from transcriptions of spontaneous speech samples, initially collected by Boonen et al. (2021). Subsequently, this data will be aggregated into entropy scores and subjected to modeling through the Bayesian Beta-proportion GLLAMM.
To address the primary objective, the study poses three key research questions. First, given the importance of accurate predictions in developing useful practical models and testing research theories (Shmueli & Koppius, 2011), Research Question 1 (RQ1) assesses whether the Beta-proportion GLLAMM yields more accurate predictions than the more prevalent Normal Linear Mixed Model (LMM) (Holmes et al., 2019). Second, acknowledging that intelligibility is an unobservable, intricate concept and a key indicator of oral communication competence (Kent et al., 19943), Research Question 2 (RQ2) investigates how the proposed model can estimate speakers’ latent intelligibility from manifest entropy scores. Thirdly, recognizing that research involves developing and comparing theories, Research Question 3 (RQ3) illustrates how these research theories can be examined within the model’s framework. Specifically, RQ3 assesses the influence of speaker-related factors on the newly estimated latent intelligibility.
The findings of this study will equip researchers investigating speech intelligibility using entropy scores, or those grappling with similar data challenges, with a statistical tool that improves upon existing research models. The tool will provide an assessment of the predictability of empirical phenomena, along with the capability to develop a quantitative measure for the latent variable of interest. The latter, in turn, could facilitate the appropriate comparison of existing theories related to the latent variable, and even the development of new ones.
The data comprised the transcriptions of spontaneous speech samples originally collected by Boonen et al. (2021). The data is not publicly available due to privacy restrictions. Nonetheless, the data can be provided by the corresponding author upon reasonable request.
Boonen et al. (2021) selected 32 speakers, comprising 16 normal hearing children (NH) and 16 hearing-impaired children with cochlear implants (HI/CI). At the time of the collection of the speech samples, the NH group were between 68 and 104 months old (M = 86.3, SD = 9.0), while HI/CI group were between 78 and 98 months old (M = 86.3, SD = 6.7).
Boonen and colleagues selected speech samples from a large corpus of children’s spontaneously spoken speech recordings. These recordings were obtained as the children narrated a story prompted by the picture book “Frog, Where Are You?” (Mayer, 1969) to a caregiver ‘unfamiliar with the story’. Before recording, the children were allowed to skim over the booklet and examine pictures. Prior to the selection process, the recordings were orthographically transcribed using the CHAT format in the CLAN editor (MacWhinney, 2020). These transcriptions were exclusively used in the curation of appropriate speech samples. To ensure the quality of the selection, Boonen and colleagues excluded sentences containing syntactically ill-formed or incomplete statements, background noise, crosstalk, long hesitations, revisions, or non-words. Finally, ten speech samples were randomly chosen for each of the 32 selected speakers. Each of these samples comprised a single sentence with a length of three to eleven words (M = 7.1, SD = 1.1). The process resulted in a total of 320 selected sentences collectively comprising 2,263 words.
Speech samples
Sentences with a length of three to eleven words (M = 7.1, SD = 1.1).
Boonen and colleagues recruited 105 students from the University of Antwerp. All participants were native speakers of Belgian Dutch and reported no history of hearing difficulties or prior exposure to the speech of hearing-impaired speakers.
The 320 speech samples and 105 listeners were randomly assigned to five blocks, with each block consisting of approximately 21 listeners who transcribed 64 sentences presented in random order. This resulted in a total of 47,514 transcribed words from the original 2,263 words present in the speech samples. These orthographic transcriptions were automatically aligned with a python script at the sentence level, in a column-like grid structure like the one presented in Table 1. This alignment process was repeated for each sentence within each speaker and block, and the output was manually checked and adjusted (if needed) in order to appropriately align the words. For more details on the random assignment and alignment procedures refer to Boonen et al. (2021).
Next, this study aggregated the aligned transcriptions by listener yielding 2,2634 entropy scores, one score per word. The entropy scores were calculated following Shannon’s formula (1948):
Entropy formula
\begin{equation} H_{wsib} = \frac{ \sum_{k=1}^{K} p_{k} \cdot log_{2}(p_{k}) }{ log_{2}(J)} \end{equation} \tag{4}
where H_{wsib} denotes the entropy scores confined to an interval between zero and one, with w defining the word index, s the sentence index, i the speaker index, and b the block index. Moreover, K describes the number of different word types within transcriptions, and J defines the total number of word transcriptions. Notice that by design, the total number of word transcriptions J corresponds with the number of listeners per block, i.e., 21 listeners. Lastly, p_{k} = \sum_{j=1}^{J} 1(T_{jk}) / J denotes the proportion of word types within transcriptions, with 1(T_{jk}) describing an indicator function that takes the value of one when the word type k is present in the transcription j.
These entropy scores served as the outcome variable, capturing agreement or disagreement among listeners’ word transcriptions. Lower scores indicated a higher degree of agreement between transcriptions and therefore higher intelligibility, while higher scores indicated lower intelligibility, due to a lower degree of agreement in the transcriptions (Boonen et al., 2021; Faes, De Maeyer, & Gillis, 2021). Furthermore, no score is excluded from the modeling process using univariate procedures, rather, the identification of highly influential observations is performed within the context of the proposed models, as recommended by McElreath (2020) (refer to Section 3.5).
Entropy interpretation
Lower scores indicated a higher degree of agreement between transcriptions and therefore higher intelligibility, while higher scores indicated lower intelligibility, due to a lower degree of agreement in the transcriptions (Boonen et al., 2021; Faes et al., 2021)
| Transcription | Words | ||||
|---|---|---|---|---|---|
| Number | 1 | 2 | 3 | 4 | 5 |
| 1 | de | jongen | ziet | een | kikker |
| the | boy | sees | a | frog | |
| 2 | de | jongen | ziet | de | [X] |
| the | boy | sees | the | [X] | |
| 3 | de | jongen | zag | [B] | kokkin |
| the | boy | saw | [B] | cook | |
| 4 | de | jongen | zag | geen | kikkers |
| the | boy | saw | no | frogs | |
| 5 | de | hond | zoekt | een | [X] |
| the | dog | searches | a | [X] | |
| Entropy | 0 | 0.3109 | 0.6555 | 0.8277 | 1 |
In this context, it is relevant to exemplify the entropy calculation procedure. For that purpose, the words in position two, four and five observed in Table 1 were used. These words were assumed present in the first sentence, produced by the first speaker assigned to the first block, and transcribed by five listeners (w=\{2,4,5\}, s=1, i=1, b=1, J=5). For the word 2, the first four listeners identified the word type jongen (T_{j1}), while the last identified the word type hond (T_{j2}). Therefore, two word types were identified (K=2), with proportions equal to \{ p_{1}, p_{2} \} = \{ 4/5, 1/5 \} = \{ 0.8, 0.2 \}, and entropy score equal to:
H_{2111} = \frac{ 0.8 \cdot log_{2}(0.8) + 0.2 \cdot log_{2}(0.2) }{ log_{2}(5)} \approx 0.3109 For the word 4, two listeners identified the word type een (T_{j1}), one listener the word type de (T_{j2}), and another the word geen (T_{j3}). A blank space [B] is a symbol that defines the absence of a word in a space where a word is expected, as compared with other transcriptions, during the alignment procedure. Notice that for calculation purposes, because the blank space is not expected in such position, this is considered as a different word type. Consequently four word types were registered (K=4), with proportions equal to \{ p_{1}, p_{2}, p_{3}, p_{4} \} = \{ 2/5, 1/5, 1/5, 1/5 \} = \{ 0.4, 0.2, 0.2, 0.2 \} and entropy score equal to:
H_{4111} = \frac{ 0.4 \cdot log_{2}(0.4) + 3 \cdot 0.2 \cdot log_{2}(0.2) }{ log_{2}(5)} \approx 0.8277 Lastly, for word 5, each listener transcribed a different word. it is important to highlight that when a listener does not identify a complete word, or part of it, (s)he is instructed to write [X] in that position. However, for the calculation of the entropy score, if more than one listener marks an unidentifiable word with [X], each one of them is considered a different word type. This is done to avoid the artificial reduction of the entropy score, as [X] values already indicate the word’s lack of intelligibility. . Consequently, five word types were observed, T_{j1}=kikker, T_{j2}=[X], T_{j3}=kokkin, T_{j4}=kikkers, T_{j5}=[X] (K=5), with proportions equal to \{ p_{1}, p_{2}, p_{3}, p_{4}, p_{5} \} = \{ 1/5, 1/5, 1/5, 1/5, 1/5 \} = \{ 0.2, 0.2, 0.2, 0.2, 0.2 \}, and entropy score equal to:
H_{5111} = \frac{ 5 \cdot 0.2 \cdot log_{2}(0.2) }{ log_{2}(5)} = 1
As expected, the data exploration reveals from the start two significant features of the entropy scores: clustering and boundedness (refer to Section 3.2.3 and Section 3.3.3). In the case of the entropy scores, clustering arises due to the presence of various word-level scores generated for numerous sentences, originated from different speakers and evaluated in different blocks (see code output below, depicting the first ten observations of the data). On the other hand, entropy scores exhibit boundedness as they can only take on values within the continuous interval between zero and one, particularly H_{wsib} \in [0,1] (see Figure 15 showing three randomly selected speakers).
var_int = c('bid','cid','uid','wid','HS','A','Am','Hwsib')
head( data_H[, var_int], 10 ) bid cid uid wid HS A Am Hwsib
1 1 1 1 1 2 85 17 0.05703883
2 1 1 1 2 2 85 17 0.27857304
3 1 1 1 3 2 85 17 0.27857304
4 1 1 1 4 2 85 17 0.46073935
5 1 1 1 5 2 85 17 0.11344714
6 1 1 1 6 2 85 17 0.00010000
7 1 1 1 7 2 85 17 0.00010000
8 1 1 1 8 2 85 17 0.00010000
9 2 1 2 1 2 85 17 0.06626602
10 2 1 2 2 2 85 17 0.06626602
require(rethinking)
speakers = c(20,8,11, 25,30,6)
par(mfrow=c(2,3))
for( i in speakers ){
speaker = data_H$cid == i
dat = binning( y=data_H$Hwsib[speaker], min_y=0, max_y=1,
n_bins=20, dens=T )
plot(dat, ylab="Frequency-Density", ylim=c(-0.15,max(dat)), xaxt='n',
xlim=c(-0.05,1.05), xlab='entropy', col=rgb(0,0,0,0.6) )
abline( h=0, col='gray' )
abline( v=c(0,1), lty=2, col=rgb(0,0,0,0.3) )
axis( side=1, at=as.numeric(names(dat)),
labels=names(dat), las=2, cex.axis=0.8 )
mtext( text=paste0('Speaker ', i), side=3, adj=0, cex=1.1)
}
par(mfrow=c(1,1))Additionally, the data shows the 320 speakers’ speech samples consists of sentences with a minimum of 3 and a maximum of 11 words per sentence (M=7.1, SD=1.1), where most of the speech samples have between 5 and 9 words per sentence (see Figure 16).
speech_samples = with(data_H, table(cid, uid) )
speech_samples uid
cid 1 2 3 4 5 6 7 8 9 10
1 8 8 8 7 8 7 7 6 6 5
2 9 7 7 7 6 6 6 6 5 7
3 8 8 8 8 8 7 8 7 7 6
4 8 8 8 7 7 7 7 7 6 6
5 8 9 7 7 7 7 7 7 6 7
6 8 6 3 6 6 6 6 5 4 4
7 9 7 7 7 7 8 4 6 6 6
8 8 8 8 9 7 7 6 6 6 6
9 9 11 7 6 7 6 6 5 5 6
10 9 8 8 9 8 8 7 6 6 6
11 10 9 7 7 7 7 7 7 6 7
12 9 8 7 8 8 8 6 7 6 6
13 7 7 7 7 7 7 7 7 6 6
14 8 8 8 7 8 6 6 6 5 6
15 8 8 7 7 7 8 7 8 7 6
16 8 7 7 7 7 6 6 6 6 6
17 8 7 7 7 7 6 7 7 6 6
18 8 8 8 7 7 7 6 6 6 6
19 9 8 7 7 8 8 6 8 6 6
20 10 8 8 7 7 7 7 6 6 6
21 9 8 8 8 7 7 7 8 6 6
22 9 8 8 7 7 7 7 7 8 7
23 8 8 7 7 7 7 7 7 6 6
24 7 7 7 7 7 7 7 8 7 7
25 9 8 8 7 7 8 8 7 6 6
26 7 7 7 9 7 7 7 7 7 7
27 9 8 8 7 7 7 7 7 6 6
28 9 8 8 8 7 7 7 7 6 6
29 8 9 8 9 7 7 7 6 6 7
30 9 7 7 7 7 7 6 6 5 5
31 11 8 8 8 7 7 7 7 7 6
32 8 8 8 10 7 8 6 6 6 6
speech_samples = data.frame( speech_samples )
hist(speech_samples$Freq, breaks=20, xlim=c(2, 12),
main='', xlab='words per sentence')psych::describe( speech_samples$Freq ) vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 320 7.07 1.06 7 7.04 1.48 3 11 8 0.19 1.45 0.06
Moreover, the data comprised 16 normal hearing children (NH, hearing status category 1) and 16 hearing impaired children, with cochlear implant (HI/CI, hearing status category 2). At the time of the collection of the speech samples, the NH group were between 68 and 104 months old (M=86.3, SD=9.0), while HI/CI group were between 78 and 98 months old (M=86.3, SD=6.7).
d_mom = unique( data_H[,c('cid','HS','A')])
with( d_mom, table( A, HS ) ) HS
A 1 2
68 0 1
76 0 1
78 2 0
80 1 1
82 5 0
83 0 1
84 0 1
85 0 6
86 2 1
88 1 0
93 2 2
94 1 0
97 1 0
98 1 0
104 0 2
Lastly, before fitting the models using Bayesian inference, the data was formatted as a list including all necessary information for the fitting process:
dlist = list(
N = nrow(data_H),
B = max(data_H$bid),
I = max(data_H$cid),
U = max(data_H$uid),
W = max(data_H$wid),
cHS = max(data_H$HS),
bid = data_H$bid,
cid = data_H$cid,
uid = data_H$uid,
wid = data_H$wid,
HS = data_H$HS,
A = data_H$A,
Am = with( data_H, A - min(A) ),
Hwsib = data_H$Hwsib
)
str(dlist)List of 14
$ N : int 2263
$ B : int 5
$ I : int 32
$ U : int 10
$ W : num 11
$ cHS : int 2
$ bid : int [1:2263] 1 1 1 1 1 1 1 1 2 2 ...
$ cid : int [1:2263] 1 1 1 1 1 1 1 1 1 1 ...
$ uid : int [1:2263] 1 1 1 1 1 1 1 1 2 2 ...
$ wid : num [1:2263] 1 2 3 4 5 6 7 8 1 2 ...
$ HS : int [1:2263] 2 2 2 2 2 2 2 2 2 2 ...
$ A : int [1:2263] 85 85 85 85 85 85 85 85 85 85 ...
$ Am : int [1:2263] 17 17 17 17 17 17 17 17 17 17 ...
$ Hwsib: num [1:2263] 0.057 0.279 0.279 0.461 0.113 ...
This section articulates the probabilistic formalism of both the Normal LMM and the proposed Beta-proportion GLLAMM. Subsequently, it details the set of fitted models and the estimation procedure, along with the criteria employed to assess the quality of the Bayesian inference results. Lastly, the section outlines the methodology employed for model comparison.
The general mathematical formalism of the Normal LMM posits that the likelihood of the (manifest) entropy scores H_{wsib} follows a normal distribution, i.e.
\begin{align} H_{wsib} & \sim \text{Normal} \left( \mu_{sib}, \sigma_{i} \right) \end{align} \tag{5}
where \mu_{sib} represents the average entropy at the word-level and \sigma_{i} denotes the standard deviation of the average entropy at the word-level, varying for each speaker. Given the clustered nature of the data, \mu_{sib} is defined by the linear combination of individual characteristics and several random effects:
\begin{align} \mu_{sib} &= \alpha + \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + u_{si} + e_{i} + a_{b} \end{align} \tag{6}
where HS_{i} and A_{i} denote the hearing status and chronological age of speaker i, respectively. Additionally, \alpha denote the general intercept, \alpha_{HS[i]} represents the average entropy for each hearing status group, and \beta_{A,HS[i]} denotes the evolution of the average entropy per unit of chronological age A_{i} for each hearing status group. Furthermore, u_{si} denotes the sentence-speaker random effects measuring the unexplained entropy variability within sentences for each speaker, e_{i} denotes the speaker random effects describing the unexplained entropy variability between speakers, and a_{b} denotes the block random effects assessing the unexplained variability between experimental blocks.
Several notably features of the Normal LLM can be discerned from the equations. Firstly, Equation 5 indicates that the variability of the average entropy at the word-level can differ for each speaker, enhancing the model’s robustness to mild or moderate data departures from the normal distribution assumption, such as heteroscedasticity or outliers (refer to Section 3.5). Secondly, Equation 6 reveals that the model assumes no transformation is applied to the relationship between the average entropy and the linear predictor. This is commonly known as a direct link function. Moreover, Equation 6 indicates that chronological age is centered around the minimum chronological age in the sample \bar{A}. The centering procedure is employed to prevent the interpretation of parameters outside the range of chronological ages available in the data (Everitt & Skrondal, 2010). Lastly, the equation implies the model considers separate intercept and separate slope of age for each hearing status group, i.e., NH and HI/CI speakers
Centering
Procedure use to facilitate the interpretation of regression parameters (Everitt & Skrondal, 2010).
The general mathematical formalism of the proposed Beta-proportion GLLAMM comprises four components: a response model, with its likelihood, linear predictor, and link function, and a structural model. The response model posits the likelihood of entropy scores follow a beta-proportion distribution,
GLLAMM components
\begin{align} H_{wsib} & \sim \text{BetaProp} \left( \mu_{ib}, M_{i} \right) \end{align} \tag{7}
where\mu_{ib} denotes the average entropy at the word-level and M_{i} signifies the dispersion of the average entropy at the word-level, varying for each speaker. Additionally, \mu_{ib} is defined as,
\begin{align} \mu_{ib} &= \text{logit}^{-1}[ a_{b} - SI_{i} ] \end{align} \tag{8}
where \text{logit}^{-1}(x) = exp(x) / (1+exp(x)) is the inverse-logit link function, a_{b} denotes the block random effects, and SI_{i} describes the speaker’s latent potential intelligibility. Conversely, the structural equation model relates the speakers’ latent potential intelligibility to the individual characteristics:
\begin{align} SI_{i} = \alpha + \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{i} \end{align} \tag{9}
where \alpha defines the general intercept, \alpha_{HS[i]} denotes the potential intelligibility for different hearing status groups, and \beta_{A,HS[i]} indicates the evolution of potential intelligibility per unit of chronological age for each hearing status group. Furthermore, e_{i} represents speakers block effects, describing unexplained potential intelligibility variability between speakers, and u_{i} = \sum_{s=1}^{S} u_{si}/S denotes sentence random effects, assessing the average unexplained potential intelligibility variability among sentences within each speaker, with S denoting the total number of sentences per speaker.
Several features are evident in this probabilistic representations. Firstly, akin to the Normal LMM, Equation 7 reveals that the dispersion of average entropy at the word level can differ for each speaker. This enhances the model’s robustness to mild or moderate data departures from the beta-proportion distribution assumption (refer to Section 3.5). Secondly, in contrast with the Normal LMM, Equation 8 shows the potential intelligibility of a speakers has a negative non-linear relationship with the entropy scores, explicitly highlighting the inverse relationship between intelligibility and entropy. This feature also maps the unbounded linear predictor to the bounded limits of the entropy scores. Thirdly, in contrast with the Normal LMM, Equation 9 demonstrates that the structural parameters are interpretable in terms of the latent potential intelligibility scores, where the scale of the latent trait is set by the general intercept \alpha, as required in latent variable models (Depaoli, 2021). Furthermore, the equation implies the model also considers separate intercept and separate slope of age for each hearing status group, i.e., NH and HI/CI speakers. Additionally, Equation 9 indicates that chronological age is centered around the minimum chronological age in the sample \bar{A}. Lastly, the same equation assumes the intelligibility scores have two sources of unexplained variability: e_{i} and u_{i}. The former represents inherent differences in potential intelligibility among different speakers, while the latter assumes that different sentences measure potential intelligibility differently due to variations in word difficulties and their interplay within the sentence.
Bayesian procedures require the incorporation of priors (refer to Section 3.1). This study establishes priors and hyperpriors for the parameters of both the Normal LMM and the Beta-proportion GLLAMM using prior predictive simulations. This procedure entails the semi-independent simulation of parameters, which are subsequently transformed into simulated data values according to the models’ specifications. This procedure aims to establish meaningful priors and comprehend its implications within the context of the model before incorporating any information derived from empirical data (McElreath, 2020).
Prior predictive simulations
Procedure that entails the semi-independent simulation of parameters, which are subsequently transformed into simulated data values according to the models’ specifications. The procedure aims to establish meaningful priors and comprehend its implications within the context of the model before incorporating any information derived from empirical data (McElreath, 2020).
For the parameters of the Normal LMM, non-informative priors and hyperpriors are established to align with analogous model assumptions in frequentist methods (refer to Section 3.1.4). The specified priors are as follows:
As described in Section 7.3, the models initially consider one \sigma prior for all the speakers. This choice implies that the presumed uncertainty for the unexplained variability of the average entropy at the word-level is the same for all speakers, prior to the observation of empirical data.
\begin{align} \sigma_{i} &\sim \text{Exponential}\left( 2 \right) \end{align} \tag{10}
The left panel of Figure 17 shows the weakly informative prior expects \sigma to be possible only in a positive range, as it is required for variability parameters (Depaoli, 2021). Furthermore, the right panel of Figure 17 shows that when transformed to the entropy scale, the model expect predictions to fall beyond the feasible range of the outcome.
require(rethinking)
n = 1000
param_pscale = rexp(n=n, rate=2 )
param_oscale = rnorm( n=n, mean=0.5, sd=param_pscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(0,3),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(sigma))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))Furthermore, as described in Section 7.1.1 and Section 7.3, there is a possibility that the model considers one \sigma_{i} prior for each of the speakers in the data. This choice implies that the presumed uncertainty about unexplained variability of the average entropy at the word-level is similar for each speaker, prior to observing empirical data. In this case the parameters are defined in terms of hyperpriors (refer to Section 3.1.5). \begin{align} r_{S} &\sim \text{Exponential}\left( 2 \right) \\ \sigma_{i} &\sim \text{Exponential}\left( r_{S} \right) \end{align} \tag{11}
The left panel of Figure 18 shows the weakly informative prior expects \sigma_{i} to be possible only in a positive range, as it is required for variability parameters (Depaoli, 2021). The panel also shows the parameters are more likely to happen in the interval of [0, 2.5]. Moreover, the right panel of Figure 18 shows that when the prior is transformed to the entropy scale, the model expect scores to fall beyond the feasible range of the outcome.
require(rethinking)
n = 1000
r_s = rexp(n=n, rate=2 )
z_s = rexp(n=n, rate=1 )
param_pscale = r_s * z_s
param_oscale = rnorm( n=n, mean=0.5, sd=param_pscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(0,3),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(sigma[i]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))This parameter is used in preliminary models where no mathematical formulations regarding how speaker-related factors influence intelligibility are investigated. The prior distribution for \alpha under the Normal LMM is described in Equation 12.
\begin{align} \alpha &\sim \text{Normal} \left( 0, 0.05 \right) \end{align} \tag{12}
The left panel of Figure 19 show the prior is an narrowly concentrated around zero. Moreover, the right panel of Figure 19, demonstrate that when the parameter is transformed to the entropy scale, the model anticipates entropy scores at low levels of the feasible range of the outcome. This implies that particular bias in entropy scores towards lower values are expected by prior.
require(rethinking)
n = 1000
r_s = rexp(n=n, rate=2 )
z_s = rexp(n=n, rate=1 )
param_hscale = r_s * z_s
param_pscale = rnorm( n=n, mean=0, sd=0.05 )
param_oscale = rnorm(n=n, mean=param_pscale, sd=param_hscale)
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-0.5,0.5),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(M))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))The prior distribution for the Normal LMM is described in Equation 13. Notably, the same prior is applied to both two hearing status categories. This choice implies that the parameters for each category are presumed to have similar uncertainties prior to the observation of empirical data.
\begin{align} \alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.2 \right) \end{align} \tag{13}
The left panel of Figure 20 reveal a weakly informative prior that restricts the range of probability of \alpha_{HS[i]} between [0.3, 0.7]. This implies that no particular bias towards entropy values above or below 0.5 for different hearing status groups is present in the priors. However, the right panel of Figure 20 demonstrate that when the prior is transformed to the entropy scale, the model anticipates a concentration of data around low levels of entropy, but also beyond the feasible range of the outcome.
require(rethinking)
n = 1000
r_s = rexp(n=n, rate=2 )
z_s = rexp(n=n, rate=1 )
param_hscale = r_s * z_s
param_pscale = rnorm( n=n, mean=0, sd=0.2 )
param_oscale = rnorm( n=n, mean=param_pscale, sd=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-1,1),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(alpha[HS]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))The prior distribution for the Normal LMM is described in Equation 23. Notably, the same prior is applied to both two hearing status categories. This choice implies that the evolution of entropy attributed to chronological age between the categories is presumed to have similar uncertainties prior to the observation of empirical data.
\begin{align} \beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \end{align} \tag{14}
The left panel of Figure 21 shows the prior restricts \beta_{A,HS[i]} to be mostly within the range of [-0.4, 0.4]. This implies that there is no particular bias towards a positive or negative evolution of entropy scores due to chronological age per hearing status group. However, the right panel of Figure 21 show that when this prior is transformed to the entropy scale, the model anticipate a concentration of entropy values at lower levels, but it also expects entropy scores significantly beyond the feasible range of the outcome.
require(rethinking)
n = 1000
r_s = rexp(n=n, rate=2 )
z_s = rexp(n=n, rate=1 )
param_hscale = r_s * z_s
param_pscale = rnorm( n=n, mean=0, sd=0.1 )
usd = function(i){ param_pscale * data_H$Am[i] }
param_mscale = sapply( 1:length(data_H$Am), usd )
param_oscale = rnorm( n=n, mean=param_pscale, sd=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-0.5,0.5),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(beta[AHS]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))The prior distribution of e_{i} for the Normal LMM is described in Equation 15. The same prior is assigned to each speaker in the sample. This choice implies that differences in entropy scores between speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters. In this case the parameters are defined in terms of hyperpriors (refer to Section 3.1.5).
\begin{align} m_{i} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{i} &\sim \text{Exponential} \left( 2 \right) \\ e_{i} &\sim \text{Normal} \left( m_{i}, s_{i} \right) \end{align} \tag{15}
The left panel of Figure 22 shows the prior anticipates differences in entropy scores between speakers as large 3 units of entropy. However, the right panel of Figure 22 demonstrate that when transformed to the entropy scale the model anticipates a concentration scores around low levels, but also it expects the differences to go way beyond the feasible range of the outcome.
require(rethinking)
n = 1000
r_s = rexp(n=n, rate=2 )
z_s = rexp(n=n, rate=1 )
param_hscale = r_s * z_s
m_i = rnorm(n=n, mean=0, sd=0.05 )
s_i = rexp(n=n, rate=2 )
param_pscale = rnorm( n=n, mean=m_i, sd=s_i )
param_oscale = rnorm( n=n, mean=param_pscale, sd=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-1.5,1.5),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(e[i]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))The prior distribution of u_{si} for the Normal LMM is described in Equation 16. The same prior is assigned to each sentence within each speakers in the sample. This choice implies that the average entropy score differences among sentences within speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters (refer to Section 3.1.5).
\begin{align} m_{u} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{u} &\sim \text{Exponential} \left( 2 \right) \\ u_{si} &\sim \text{Normal} \left( m_{u}, s_{u} \right) \\ \end{align} \tag{16}
The left panel of Figure 23 shows the prior restricts the average differences in entropy among sentences within speakers can be as large as 3 units of measurement. Furthermore, the right panel of Figure 23 demonstrate that when transformed to the entropy scale the model anticipates a concentration of scores around mid-levels of entropy. More importantly, the model expects the differences to go beyond the feasible range of the outcome.
require(rethinking)
n = 1000
r_s = rexp(n=n, rate=2 )
z_s = rexp(n=n, rate=1 )
param_hscale = r_s * z_s
m_u = rnorm(n=n, mean=0, sd=0.05 )
s_u = rexp(n=n, rate=2 )
param_pscale = rnorm( n=n, mean=m_u, sd=s_u )
param_oscale = rnorm( n=n, mean=param_pscale, sd=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-1.5,1.5),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(u[i]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))The prior distribution for the Normal LMM is described in Equation 17. The same prior is assigned to each block. This choice implies that the average entropy score differences among blocks are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of hyperpriors (refer to Section 3.1.5).
\begin{align} m_{b} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{b} &\sim \text{Exponential} \left( 2 \right) \\ a_{b} &\sim \text{Normal} \left( m_{b}, s_{b} \right) \end{align} \tag{17}
The left panel of Figure 24 shows a prior with no particular bias towards differences between blocks above or below zero units of entropy. Nevertheless, the right panel of Figure 24 demonstrate that when the prior is transformed to the entropy scale, the model anticipates a concentration of data around lower levels of entropy, but also contemplates differences beyond the feasible range of the outcome.
require(rethinking)
n = 1000
r_s = rexp(n=n, rate=2 )
z_s = rexp(n=n, rate=1 )
param_hscale = r_s * z_s
m_b = rnorm(n=n, mean=0, sd=0.05 )
s_b = rexp(n=n, rate=2 )
param_pscale = rnorm( n=n, mean=m_b, sd=s_b )
param_oscale = rnorm( n=n, mean=param_pscale, sd=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-1.5,1.5),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(u[si]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))After the careful assessment of the prior implications for each parameter, the expected prior distribution for the linear predcitor can be constructed for the Normal LMM. The prior predictive simulation can be described as in Equation 18:
\begin{align} m &\sim \text{Normal} \left( 0, 0.05 \right) \\ s &\sim \text{Exponential} \left( 2 \right) \\ e_{i} &\sim \text{Normal} \left( m, s \right) \\ u_{si} &\sim \text{Normal} \left( m, s \right) \\ a_{b} &\sim \text{Normal} \left( m, s \right) \\ \alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.2 \right) \\ \beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \\ g(\cdot) &= \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{si} + a_{b} \\ \end{align} \tag{18}
The left panel of Figure 34 shows the prior expects speakers’ potential intelligibility scores to be more probable between [-2.5, 2.5], implying there is particular bias towards negative entropy scores is present jointly in these priors. Furthermore, the right panel of Figure 34, demonstrate that when transformed to the entropy scale, the model anticipates prediction of entropy scores within its feasible range, but somewhat more probable in the extremes of entropy.
require(rethinking)
n = 1000
r_s = rexp(n=n, rate=2 )
z_s = rexp(n=n, rate=1 )
param_hscale = r_s * z_s
m = rnorm(n=n, mean=0, sd=0.05 )
s = rexp(n=n, rate=2 )
e_i = rnorm( n=n, mean=m, sd=s )
u_si = rnorm( n=n, mean=m, sd=s )
a_b = rnorm( n=n, mean=m, sd=s )
aHS = rnorm( n=n, mean=0, sd=0.2 )
bAHS = rnorm( n=n, mean=0, sd=0.1 )
param_pscale = aHS + bAHS + u_si + e_i + a_b
param_oscale = rnorm( n=n, mean=param_pscale,
sd=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-3,3),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(SI[i]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))For the parameters of the Beta-proportion GLLAMM, weakly informative priors and hyperpriors are established (refer to Section 3.1.4). The specified priors are as follows:
Similar to the Normal LMM, Section 7.3 describes a Beta-proportion GLLAMM that initially considers one M for all speakers in the data. This choice implies that the presumed uncertainty for the unexplained variability of the average entropy at the word-level is the same for all speakers, prior to the observation of empirical data.
\begin{align} M &\sim \text{Exponential}\left( 0.4 \right) \end{align} \tag{19}
The left and right panel of Figure 26, demonstrate the prior of M expects the parameters to be more probable in a positive range between [0, 7], while predicting scores within the boundaries of the data. This implies that no particular bias is present in the word-level entropy unexplained variability, only that it is positive, as expected for measures of variability.
require(rethinking)
n = 1000
param_pscale = rexp( n=n, rate=0.4 )
param_oscale = rbeta2( n=n, prob=0.5, theta=param_pscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(0,10),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(M))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))Furthermore, as described in Section 7.1.2 and Section 7.3, there is a possibility that the model considers one M_{i} prior for each speakers in the data. This choice implies the presumed uncertainty for the unexplained dispersion of the average entropy at the word-level is similar for each speaker, prior to the observation of empirical data. In this case the parameters are defined in terms of hyperpriors (refer to Section 3.1.5).
\begin{align} r_{M} &\sim \text{Exponential}\left( 0.2 \right) \\ M_{i} &\sim \text{Exponential}\left( r_{M} \right) \end{align} \tag{20}
The left and right panel of Figure 27, demonstrate the prior of M_{i} expects the parameters to be more probable in a positive range between [0, 20], while at the same time predicting data within the boundaries of the entropy scores. This implies that no particular bias is present in the word-level entropy unexplained variability, only that it is positive, as expected for measures of variability.
require(rethinking)
n = 1000
r_M = rexp(n=n, rate=0.2 )
z_M = rexp(n=n, rate=1 )
param_pscale = r_M * z_M
param_oscale = rbeta2( n=n, prob=0.5, theta=param_pscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(0,20),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(M[i]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))Considering that the structural parameters are now interpretable in terms of the (latent) potential intelligibility scores, the general intercept \alpha is used to set the scale of the latent trait, as it is required in latent variable models (Depaoli, 2021) (refer to Section 3.1.4). The prior distribution for \alpha under the Beta-proportion GLLAMM is described in Equation 21.
\begin{align} \alpha &\sim \text{Normal} \left( 0, 0.05 \right) \end{align} \tag{21}
The left panel of Figure 28 show the prior is narrowly concentrated around zero. Moreover, the right panel of Figure 28, demonstrate that when the parameter is transformed to the entropy scale, the model anticipates entropy scores at mid-levels of the feasible range of the outcome. This implies that no particular bias in entropy scores are expected by prior.
require(rethinking)
n = 1000
r_M = rexp(n=n, rate=0.2 )
z_M = rexp(n=n, rate=1 )
param_hscale = r_M * z_M
param_pscale = rnorm( n=n, mean=0, sd=0.05 )
param_oscale = rbeta2( n=n, prob=inv_logit(-1*param_pscale),
theta=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-0.5,0.5),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(M))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))The prior distribution for the Beta-proportion GLLAMM is described in Equation 22. Notably, the same prior is applied to both two hearing status categories. This choice implies that the parameters for each category are presumed to have similar uncertainties prior to the observation of empirical data.
\begin{align} \alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.3 \right) \end{align} \tag{22}
The right panel of Figure 29, demonstrate that when the \alpha_{HS[i]} prior is transformed to the entropy scale, the model anticipates a concentration of data around mid levels of entropy, and not beyond the feasible range of the outcome. This implies that no particular bias towards specific entropy score values are expected from the using the prior.
require(rethinking)
n = 1000
r_M = rexp(n=n, rate=0.2 )
z_M = rexp(n=n, rate=1 )
param_hscale = r_M * z_M
param_pscale = rnorm( n=n, mean=0, sd=0.3 )
param_oscale = rbeta2( n=n, prob=inv_logit(-1*param_pscale),
theta=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-1,1),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(alpha[HS]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))The prior distribution for the Beta-proportion GLLAMM is described in Equation 23. Notably, the same prior is applied to both two hearing status categories. This choice implies that the evolution of potential intelligibility attributed to chronological age between the categories is presumed to have similar uncertainties, prior to the observation of empirical data.
\begin{align} \beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \end{align} \tag{23}
The left panel of Figure 30 shows the weakly informative prior has no particular bias towards a positive or negative evolution of potential intelligibility due to chronological age per hearing status group. Furthermore, the right panel of Figure 30, demonstrate that when transformed to the entropy scale, the model anticipates a slight concentration of data around mid levels of entropy, but more importantly, it does not expect data beyond the feasible range of the outcome.
require(rethinking)
n = 1000
r_M = rexp(n=n, rate=0.2 )
z_M = rexp(n=n, rate=1 )
param_hscale = r_M * z_M
param_pscale = rnorm( n=n, mean=0, sd=0.1 )
usd = function(i){ param_pscale * data_H$Am[i] }
param_mscale = sapply( 1:length(data_H$Am), usd )
param_oscale = rbeta2( n=n, prob=inv_logit(-1*param_pscale),
theta=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-1,1),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(beta[AHS]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))The prior distribution for the Beta-proportion GLLAMM is described in Equation 24. The same prior is assigned to each speakers in the sample. This choice implies that differences in potential intelligibility differences between speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters, called hyperpriors (refer to Section 3.1.5).
\begin{align} m_{i} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{i} &\sim \text{Exponential} \left( 2 \right) \\ e_{i} &\sim \text{Normal} \left( m_{i}, s_{i} \right) \end{align} \tag{24}
The left panel of Figure 31 shows the prior anticipates differences in intelligibility between speakers as large 3 units of measurement. Furthermore, the right panel of Figure 31, demonstrate that when transformed to the entropy scale, the model anticipates a high concentration around mid-levels of entropy. However, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences in potential intelligibility between speakers are expected resulting from using this prior.
require(rethinking)
n = 1000
r_M = rexp(n=n, rate=0.2 )
z_M = rexp(n=n, rate=1 )
param_hscale = r_M * z_M
m_i = rnorm( n=n, mean=0, sd=0.05 )
s_i = rexp(n=n, rate=2 )
param_pscale = rnorm( n=n, mean=m_i, sd=s_i )
param_oscale = rbeta2( n=n, prob=inv_logit(-1*param_pscale),
theta=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-2,2),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(e[i]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))The prior distribution of u_{i} for the Beta-proportion GLLAMM is described in Equation 25. The same prior is assigned to each sentence within each speakers in the sample. This choice implies that the average potential intelligibility differences among sentences within speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of hyperpriors (refer to Section 3.1.5). Next, the within sentence-speaker differences are then aggregated to the speaker level to form the sentence random effects u_{i},
\begin{align} m_{u} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{u} &\sim \text{Exponential} \left( 2 \right) \\ u_{si} &\sim \text{Normal} \left( m_{u}, s_{u} \right) \\ u_{i} &= \sum_{s=1}^{S} \frac{u_{si}}{S} \end{align} \tag{25}
The left panel of Figure 32 shows the prior restricts the average differences in potential intelligibility among sentences within speakers can be as large as 0.8 units of measurement. Furthermore, the right panel of Figure 32, demonstrate that when u_{i} is transformed to the entropy scale, the model anticipates a high concentration of scores around mid-levels of entropy. However, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences in potential intelligibility is expected between speakers.
require(rethinking)
n = 1000
r_M = rexp(n=n, rate=0.2 )
z_M = rexp(n=n, rate=1 )
param_hscale = r_M * z_M
m_u = rnorm( n=n, mean=0, sd=0.05 )
s_u = rexp(n=n, rate=2 )
param_pscale = replicate(
n=n, expr=mean( rnorm( n=10, mean=m_u, sd=s_u ) ) )
param_oscale = rbeta2( n=n, prob=inv_logit(-1*param_pscale),
theta=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-1,1),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(u[i]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))The prior distribution for the Beta-proportion GLLAMM is described in Equation 26. The same prior is assigned to each block. This choice implies that the average entropy scores differences among blocks are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of hyperpriors (refer to Section 3.1.5).
\begin{align} m_{b} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{b} &\sim \text{Exponential} \left( 2 \right) \\ a_{b} &\sim \text{Normal} \left( m_{b}, s_{b} \right) \end{align} \tag{26}
The left panel of Figure 33 shows a prior with no particular bias towards positive or negative differences between blocks. Furthermore, the right panel of Figure 33 demonstrate that when transformed to the entropy scale, the model anticipates a high concentration of data around mid levels of entropy, but not beyond the feasible range of the outcome.
require(rethinking)
n = 1000
r_M = rexp(n=n, rate=0.2 )
z_M = rexp(n=n, rate=1 )
param_hscale = r_M * z_M
m_b = rnorm( n=n, mean=0, sd=0.05 )
s_b = rexp(n=n, rate=2 )
param_pscale = rnorm( n=n, mean=m_b, sd=s_b )
param_oscale = rbeta2( n=n, prob=inv_logit(param_pscale),
theta=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-2,2),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(u[si]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))After the careful assessment of the prior implications for each parameter, the expected prior distribution for the potential intelligibility can be constructed for the Beta-proportion GLLAMM. The prior predictive simulation can be described as in Equation 27:
\begin{align} \alpha &\sim \text{Normal} \left( 0, 0.05 \right) \\ \alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.3 \right) \\ \beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \\ m &\sim \text{Normal} \left( 0, 0.05 \right) \\ s &\sim \text{Exponential} \left( 2 \right) \\ e_{i} &\sim \text{Normal} \left( m, s \right) \\ u_{si} &\sim \text{Normal} \left( m, s \right) \\ u_{i} &= \sum_{s=1}^{S} \frac{u_{si}}{S} \\ a_{b} &\sim \text{Normal} \left( m, s \right) \\ SI_{si} &= \alpha + \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{i} \\ \end{align} \tag{27}
The left panel of Figure 34 shows the prior expects speakers’ potential intelligibility scores to be more probable between [-3, 3], implying no particular bias towards positive or negative potential intelligibility is present jointly in these priors. Furthermore, the right panel of Figure 34, demonstrate that when transformed to the entropy scale, the model anticipates prediction of entropy scores within its feasible range, but somewhat more probable in the extremes of entropy.
require(rethinking)
n = 1000
r_M = rexp(n=n, rate=0.2 )
z_M = rexp(n=n, rate=1 )
param_hscale = r_M * z_M
m = rnorm( n=n, mean=0, sd=0.05 )
s = rexp(n=n, rate=2 )
e_i = rnorm( n=n, mean=m, sd=s )
u_i = replicate( n=n,
exp=mean( rnorm( n=10, mean=m, sd=s ) ) )
a_b = rnorm( n=n, mean=m, sd=s )
a = rnorm( n=n, mean=0, sd=0.05 )
aHS = rnorm( n=n, mean=0, sd=0.3 )
bAHS = rnorm( n=n, mean=0, sd=0.1 )
param_pscale = a + aHS + bAHS + e_i + u_i
param_oscale = rbeta2( n=n, prob=inv_logit(a_b-param_pscale),
theta=param_hscale )
par(mfrow=c(1,2))
dens( param_pscale, xlim=c(-3,3),
show.HPDI=0.95,
main='Parameter scale',
xlab=expression(SI[i]))
dens( param_oscale, xlim=c(0,1),
show.HPDI=0.95,
main='Entropy scale',
xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')
par(mfrow=c(1,1))This study evaluates the comparative predictive capabilities of both the Normal LMM and the Beta-proportion GLLAMM (RQ1) while simultaneously examining various formulations regarding how speaker-related factors influence intelligibility (RQ3). In this context, the predictive capabilities of the models are intricately connected to these formulations. As a result, the study requires fitting 12 different models, each representing a specific manner to investigate one or both research questions. The models comprised six versions of both the Normal LMM and the Beta-proportion GLLAMM. The differences among the models hinged on (1) whether they addressed data clustering in conjunction with measurement error, denoted as the model type, (2) the assumed distribution for the entropy scores, which aimed to handle boundedness, (3) whether the model incorporates a robust feature to address mild or moderate departures of the data from distributional assumptions, and (4) the inclusion or exclusion of speaker-related factors in the models. A detailed overview of the fitted models is available in Table 2.
| Model | Entropy | Robust | Fixed effects | |||
|---|---|---|---|---|---|---|
| Model | type | distribution | feature | \beta_{HS[i]} | \beta_{A} | \beta_{A,HS[i]} |
| 1 | LMM | Normal | No | No | No | No |
| 2 | LMM | Normal | No | Yes | Yes | No |
| 3 | LMM | Normal | No | Yes | No | Yes |
| 4 | LMM | Normal | Yes | No | No | No |
| 5 | LMM | Normal | Yes | Yes | Yes | No |
| 6 | LMM | Normal | Yes | Yes | No | Yes |
| 7 | GLLAMM | Beta-prop. | No | No | No | No |
| 8 | GLLAMM | Beta-prop. | No | Yes | Yes | No |
| 9 | GLLAMM | Beta-prop. | No | Yes | No | Yes |
| 10 | GLLAMM | Beta-prop. | Yes | No | No | No |
| 11 | GLLAMM | Beta-prop. | Yes | Yes | Yes | No |
| 12 | GLLAMM | Beta-prop. | Yes | Yes | No | Yes |
The following tabset panel provides the commentated Stan code for all fitted model. Furthermore, the models are implemented using non-centered priors (refer to Section 3.1.5).
```{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
real a; // intercept
// vector[cHS] aHS; // HS effects
// real bAm; // Am effects
// vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
real<lower=0> s_w; // speaker, utterance, word sd (one overall)
//real<lower=0> r_s; // global rate for SD
//vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
//vector[I] s_w; // speaker, utterance, word sd (one per speaker)
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
//s_w = z_s * r_s; // non-centered speaker, utterance, word sd
// average entropy
for(n in 1:N){
mu[n] = a +
// aHS[ HS[n] ] +
// bAm*Am[n] +
// bAmHS[ HS[n] ]*Am[n] +
b_i[ bid[n] ] +
e_i[ cid[n] ] +
u_si[ cid[n], uid[n] ];
}
}
model{
// fixed effects priors
a ~ normal( 0 , 0.05 );
// aHS ~ normal( 0, 0.2 );
// bAm ~ normal( 0 , 0.1 );
// bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
s_w ~ exponential( 2 );
//r_s ~ exponential( 2 );
//z_s ~ exponential( 1 );
// likelihood
for(n in 1:N){
Hwsib[n] ~ normal( mu[n] , s_w );
// Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
// log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
}
}
"
# saving
model_nam = "model01.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
``````{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
//real a; // intercept
vector[cHS] aHS; // HS effects
real bAm; // Am effects
// vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
real<lower=0> s_w; // speaker, utterance, word sd (one overall)
//real<lower=0> r_s; // global rate for SD
//vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
//vector[I] s_w; // speaker, utterance, word sd (one per speaker)
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
//s_w = z_s * r_s; // non-centered speaker, utterance, word sd
// average entropy
for(n in 1:N){
mu[n] = //a +
aHS[ HS[n] ] +
bAm*Am[n] +
// bAmHS[ HS[n] ]*Am[n] +
b_i[ bid[n] ] +
e_i[ cid[n] ] +
u_si[ cid[n], uid[n] ];
}
}
model{
// fixed effects priors
//a ~ normal( 0 , 0.05 );
aHS ~ normal( 0, 0.2 );
bAm ~ normal( 0 , 0.1 );
// bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
s_w ~ exponential( 2 );
//r_s ~ exponential( 2 );
//z_s ~ exponential( 1 );
// likelihood
for(n in 1:N){
Hwsib[n] ~ normal( mu[n] , s_w );
// Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
// log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
}
}
"
# saving
model_nam = "model02.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
``````{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
//real a; // intercept
vector[cHS] aHS; // HS effects
// real bAm; // Am effects
vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
real<lower=0> s_w; // speaker, utterance, word sd (one overall)
//real<lower=0> r_s; // global rate for SD
//vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
//vector[I] s_w; // speaker, utterance, word sd (one per speaker)
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
//s_w = z_s * r_s; // non-centered speaker, utterance, word sd
// average entropy
for(n in 1:N){
mu[n] = //a +
aHS[ HS[n] ] +
// bAm*Am[n] +
bAmHS[ HS[n] ]*Am[n] +
b_i[ bid[n] ] +
e_i[ cid[n] ] +
u_si[ cid[n], uid[n] ];
}
}
model{
// fixed effects priors
//a ~ normal( 0 , 0.05 );
aHS ~ normal( 0, 0.2 );
// bAm ~ normal( 0 , 0.1 );
bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
s_w ~ exponential( 2 );
//r_s ~ exponential( 2 );
//z_s ~ exponential( 1 );
// likelihood
for(n in 1:N){
Hwsib[n] ~ normal( mu[n] , s_w );
// Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
// log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
}
}
"
# saving
model_nam = "model03.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
``````{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
real a; // intercept
// vector[cHS] aHS; // HS effects
// real bAm; // Am effects
// vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
// real<lower=0> s_w; // speaker, utterance, word sd (one overall)
real<lower=0> r_s; // global rate for SD
vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] s_w; // speaker, utterance, word sd (one per speaker)
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
s_w = z_s * r_s; // non-centered speaker, utterance, word sd
// average entropy
for(n in 1:N){
mu[n] = a +
// aHS[ HS[n] ] +
// bAm*Am[n] +
// bAmHS[ HS[n] ]*Am[n] +
b_i[ bid[n] ] +
e_i[ cid[n] ] +
u_si[ cid[n], uid[n] ];
}
}
model{
// fixed effects priors
a ~ normal( 0 , 0.05 );
// aHS ~ normal( 0, 0.2 );
// bAm ~ normal( 0 , 0.1 );
// bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
//s_w ~ exponential( 2 );
r_s ~ exponential( 2 );
z_s ~ exponential( 1 );
// likelihood
for(n in 1:N){
// Hwsib[n] ~ normal( mu[n] , s_w );
Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
// log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
}
}
"
# saving
model_nam = "model04.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
``````{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
//real a; // intercept
vector[cHS] aHS; // HS effects
real bAm; // Am effects
// vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
// real<lower=0> s_w; // speaker, utterance, word sd (one overall)
real<lower=0> r_s; // global rate for SD
vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] s_w; // speaker, utterance, word sd (one per speaker)
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
s_w = z_s * r_s; // non-centered speaker, utterance, word sd
// average entropy
for(n in 1:N){
mu[n] = //a +
aHS[ HS[n] ] +
bAm*Am[n] +
// bAmHS[ HS[n] ]*Am[n] +
b_i[ bid[n] ] +
e_i[ cid[n] ] +
u_si[ cid[n], uid[n] ];
}
}
model{
// fixed effects priors
//a ~ normal( 0 , 0.05 );
aHS ~ normal( 0, 0.2 );
bAm ~ normal( 0 , 0.1 );
//bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
//s_w ~ exponential( 2 );
r_s ~ exponential( 2 );
z_s ~ exponential( 1 );
// likelihood
for(n in 1:N){
// Hwsib[n] ~ normal( mu[n] , s_w );
Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
// log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
}
}
"
# saving
model_nam = "model05.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
``````{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
//real a; // intercept
vector[cHS] aHS; // HS effects
// real bAm; // Am effects
vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
// real<lower=0> s_w; // speaker, utterance, word sd (one overall)
real<lower=0> r_s; // global rate for SD
vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] s_w; // speaker, utterance, word sd (one per speaker)
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
s_w = z_s * r_s; // non-centered speaker, utterance, word sd
// average entropy
for(n in 1:N){
mu[n] = //a +
aHS[ HS[n] ] +
// bAm*Am[n] +
bAmHS[ HS[n] ]*Am[n] +
b_i[ bid[n] ] +
e_i[ cid[n] ] +
u_si[ cid[n], uid[n] ];
}
}
model{
// fixed effects priors
//a ~ normal( 0 , 0.05 );
aHS ~ normal( 0, 0.2 );
// bAm ~ normal( 0 , 0.1 );
bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
//s_w ~ exponential( 2 );
r_s ~ exponential( 2 );
z_s ~ exponential( 1 );
// likelihood
for(n in 1:N){
// Hwsib[n] ~ normal( mu[n] , s_w );
Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
// log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
}
}
"
# saving
model_nam = "model06.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
``````{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
real a; // intercept
// vector[cHS] aHS; // HS effects
// real bAm; // Am effects
// vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
real<lower=0> Mw; // 'sample size' parameter
//real<lower=0> r_M; // global rate for 'sample size'
//vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] u_i; // sentence average random effects
//vector[I] Mw; // non-centered 'sample size' (one per speaker)
vector[I] SI; // SI index
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
//Mw = z_M * r_M; // non-centered 'sample size'
// intelligibility and average entropy
for(i in 1:I){
u_i[ i ] = mean( u_si[ i, ] );
}
for(n in 1:N){
SI[ cid[n] ] = a +
// aHS[ HS[n] ] +
// bAm*Am[n] +
// bAmHS[ HS[n] ]*Am[n] +
e_i[ cid[n] ] +
u_i[ cid[n] ];
mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
}
}
model{
// fixed effects priors
a ~ normal( 0 , 0.05 );
// aHS ~ normal( 0 , 0.3 );
// bAm ~ normal( 0 , 0.1 );
// bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
Mw ~ exponential( 0.4 );
//r_M ~ exponential( 0.2 );
//z_M ~ exponential( 1 );
// likelihood
for(n in 1:N){
Hwsib[n] ~ beta_proportion( mu[n] , Mw );
// Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
// log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
}
}
"
# saving
model_nam = "model07.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
``````{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
real a; // intercept
vector[cHS] aHS; // HS effects
real bAm; // Am effects
// vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
real<lower=0> Mw; // 'sample size' parameter
//real<lower=0> r_M; // global rate for 'sample size'
//vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] u_i; // sentence average random effects
//vector[I] Mw; // non-centered 'sample size' (one per speaker)
vector[I] SI; // SI index
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
//Mw = z_M * r_M; // non-centered 'sample size'
// intelligibility and average entropy
for(i in 1:I){
u_i[ i ] = mean( u_si[ i, ] );
}
for(n in 1:N){
SI[ cid[n] ] = a +
aHS[ HS[n] ] +
bAm*Am[n] +
// bAmHS[ HS[n] ]*Am[n] +
e_i[ cid[n] ] +
u_i[ cid[n] ];
mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
}
}
model{
// fixed effects priors
a ~ normal( 0 , 0.05 );
aHS ~ normal( 0 , 0.3 );
bAm ~ normal( 0 , 0.1 );
// bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
Mw ~ exponential( 0.4 );
//r_M ~ exponential( 0.2 );
//z_M ~ exponential( 1 );
// likelihood
for(n in 1:N){
Hwsib[n] ~ beta_proportion( mu[n] , Mw );
// Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
// log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
}
}
"
# saving
model_nam = "model08.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
``````{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
real a; // intercept
vector[cHS] aHS; // HS effects
// real bAm; // Am effects
vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
real<lower=0> Mw; // 'sample size' parameter
//real<lower=0> r_M; // global rate for 'sample size'
//vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] u_i; // sentence average random effects
//vector[I] Mw; // non-centered 'sample size' (one per speaker)
vector[I] SI; // SI index
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
//Mw = z_M * r_M; // non-centered 'sample size'
// intelligibility and average entropy
for(i in 1:I){
u_i[ i ] = mean( u_si[ i, ] );
}
for(n in 1:N){
SI[ cid[n] ] = a +
aHS[ HS[n] ] +
// bAm*Am[n] +
bAmHS[ HS[n] ]*Am[n] +
e_i[ cid[n] ] +
u_i[ cid[n] ];
mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
}
}
model{
// fixed effects priors
a ~ normal( 0 , 0.05 );
aHS ~ normal( 0 , 0.3 );
// bAm ~ normal( 0 , 0.1 );
bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
Mw ~ exponential( 0.4 );
//r_M ~ exponential( 0.2 );
//z_M ~ exponential( 1 );
// likelihood
for(n in 1:N){
Hwsib[n] ~ beta_proportion( mu[n] , Mw );
// Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
// log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
}
}
"
# saving
model_nam = "model09.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
``````{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
real a; // intercept
// vector[cHS] aHS; // HS effects
// real bAm; // Am effects
// vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
// real<lower=0> Mw; // 'sample size' parameter
real<lower=0> r_M; // global rate for 'sample size'
vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] u_i; // sentence average random effects
vector[I] Mw; // non-centered 'sample size' (one per speaker)
vector[I] SI; // SI index
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
Mw = z_M * r_M; // non-centered 'sample size'
// intelligibility and average entropy
for(i in 1:I){
u_i[ i ] = mean( u_si[ i, ] );
}
for(n in 1:N){
SI[ cid[n] ] = a +
// aHS[ HS[n] ] +
// bAm*Am[n] +
// bAmHS[ HS[n] ]*Am[n] +
e_i[ cid[n] ] +
u_i[ cid[n] ];
mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
}
}
model{
// fixed effects priors
a ~ normal( 0 , 0.05 );
// aHS ~ normal( 0 , 0.3 );
// bAm ~ normal( 0 , 0.1 );
// bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
//Mw ~ exponential( 0.4 );
r_M ~ exponential( 0.2 );
z_M ~ exponential( 1 );
// likelihood
for(n in 1:N){
// Hwsib[n] ~ beta_proportion( mu[n] , Mw );
Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
// log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
}
}
"
# saving
model_nam = "model10.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
``````{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
real a; // intercept
vector[cHS] aHS; // HS effects
real bAm; // Am effects
// vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
// real<lower=0> Mw; // 'sample size' parameter
real<lower=0> r_M; // global rate for 'sample size'
vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] u_i; // sentence average random effects
vector[I] Mw; // non-centered 'sample size' (one per speaker)
vector[I] SI; // SI index
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
Mw = z_M * r_M; // non-centered 'sample size'
// intelligibility and average entropy
for(i in 1:I){
u_i[ i ] = mean( u_si[ i, ] );
}
for(n in 1:N){
SI[ cid[n] ] = a +
aHS[ HS[n] ] +
bAm*Am[n] +
// bAmHS[ HS[n] ]*Am[n] +
e_i[ cid[n] ] +
u_i[ cid[n] ];
mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
}
}
model{
// fixed effects priors
a ~ normal( 0 , 0.05 );
aHS ~ normal( 0 , 0.3 );
bAm ~ normal( 0 , 0.1 );
// bAmHS ~ normal( 0 , 0.1 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
//Mw ~ exponential( 0.4 );
r_M ~ exponential( 0.2 );
z_M ~ exponential( 1 );
// likelihood
for(n in 1:N){
// Hwsib[n] ~ beta_proportion( mu[n] , Mw );
Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
// log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
}
}
"
# saving
model_nam = "model11.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
``````{r}
mcmc_code = "
data{
// dimensions
int N; // number of experimental runs
int B; // max. number of blocks
int I; // max. number of experimental units (speakers)
int U; // max. number of sentences
int W; // max. number of words
// category numbers
int cHS; // max. number of categories in HS
// data
array[N] int<lower=1, upper=B> bid; // block id
array[N] int<lower=1, upper=I> cid; // speaker's id
array[N] int<lower=1, upper=U> uid; // sentence's id
array[N] int<lower=1, upper=cHS> HS; // hearing status
array[N] real Am; // chron. age - min( chron. age )
array[N] real Hwsib; // replicated entropies
}
parameters{
// fixed effects parameters
real a; // intercept
vector[cHS] aHS; // HS effects
// real bAm; // Am effects
vector[cHS] bAmHS; // Am effects (per HS)
// random effects parameters
real m_b; // block RE mean
real<lower=0> s_b; // block RE sd
vector[B] z_b; // non-centered block RE
real m_i; // speaker RE mean
real<lower=0> s_i; // speaker RE sd
vector[I] z_i; // non-centered speaker RE
real m_u; // speaker, utterance RE mean
real<lower=0> s_u; // speaker, utterance RE sd
matrix[I,U] z_u; // non-centered speaker, utterance RE
// variability parameters
// real<lower=0> Mw; // 'sample size' parameter
real<lower=0> r_M; // global rate for 'sample size'
vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)
}
transformed parameters{
// to track
vector[B] b_i; // block random effects
vector[I] e_i; // speaker random effects
matrix[I,U] u_si; // sentence random effects
vector[I] u_i; // sentence average random effects
vector[I] Mw; // non-centered 'sample size' (one per speaker)
vector[I] SI; // SI index
vector[N] mu; // NO TRACK
// random effects
b_i = m_b + s_b*z_b; // non-centered block RE
e_i = m_i + s_i*z_i; // non-centered speaker RE
u_si = m_u + s_u*z_u; // non-centered utterance RE
Mw = z_M * r_M; // non-centered 'sample size'
// intelligibility and average entropy
for(i in 1:I){
u_i[ i ] = mean( u_si[ i, ] );
}
for(n in 1:N){
SI[ cid[n] ] = a +
aHS[ HS[n] ] +
// bAm*Am[n] +
bAmHS[ HS[n] ]*Am[n] +
e_i[ cid[n] ] +
u_i[ cid[n] ];
mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
}
}
model{
// fixed effects priors
a ~ normal( 0 , 0.05 );
aHS ~ normal( 0 , 0.3 );
// bAm ~ normal( 0 , 0.3 );
bAmHS ~ normal( 0 , 0.3 );
// random effects priors
m_b ~ normal( 0 , 0.05 );
s_b ~ exponential( 2 );
z_b ~ std_normal();
m_i ~ normal( 0 , 0.05 );
s_i ~ exponential( 2 );
z_i ~ std_normal();
m_u ~ normal( 0 , 0.05 );
s_u ~ exponential( 2 );
to_vector( z_u ) ~ std_normal();
// variability priors
//Mw ~ exponential( 0.4 );
r_M ~ exponential( 0.2 );
z_M ~ exponential( 1 );
// likelihood
for(n in 1:N){
// Hwsib[n] ~ beta_proportion( mu[n] , Mw );
Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
}
}
generated quantities{
// track
vector[N] log_lik;
// log-likelihood
for(n in 1:N){
// log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
}
}
"
# saving
model_nam = "model12.stan"
writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )
```Moreover, the following code is provided so the reader can fit all Stan models.
```{r}
for(i in 1:12){
model_nam = paste0( ifelse(i<10, 'model0', 'model'), i, '.stan')
model_in = file.path(getwd(), 'real_models')
model_out = file.path(getwd(), 'real_chain')
mod = cmdstan_model( file.path(model_in, model_nam) )
print(model_nam)
mod$sample( data=dlist,
output_dir=model_out,
output_basename = str_replace(model_nam, '.stan', ''),
num_warmup=2000, num_samples=2000,
chains=4, parallel_chains=4,
max_treedepth=20, adapt_delta=0.95) #,init=0
}
```The models were estimated using R version 4.2.2 (R Core Team, 2015) and Stan version 2.26.1 (Stan Development Team., 2021). Four Markov chains were implemented for each parameter, each with distinct starting values. Each chain underwent 4,000 iterations, where the first 2,000 served as a warm-up phase and the remaining 2,000 were considered samples from the posterior distribution.
Verification of stationarity, convergence, and mixing for the parameter chains involved graphical analysis and diagnostic statistics. Graphical analysis utilized trace, trace-rank, and autocorrelation plots (ACF). Diagnostic statistics included the potential scale reduction factor statistics \widehat{\text{R}} with a cut-off value of 1.05 (A. Vehtari, Gelman, Simpson, Carpenter, & Bürkner, 2021a). Furthermore, to confirm whether the parameters posterior distributions were generated with a sufficient number of uncorrelated sampling points, each posterior distribution density plot was inspected along with their effective sample size statistics n_{\text{eff}} (Gelman et al., 2014).
The study compares the fitted models using three criteria: the deviance information criterion (DIC) by Spiegelhalter et al. (Spiegelhalter, Best, Carlin, & van der Linde, 2002), the widely applicable information criterion (WAIC) by Watanabe (2013), and the Pareto Smoothing Importance Sampling criterion (PSIS) by Vehtari et al. (2017). These criteria score models in terms of deviations from perfect predictive accuracy, with smaller values indicating less deviation (McElreath, 2020). Specifically, DIC measures in-sample deviations, while WAIC and PSIS offer an approximate measure of out-of-sample deviations. Deviations from perfect predictive accuracy serve as the closest estimate for the Kullback-Leibler divergence (Kullback & Leibler, 1951), which measures the degree to which a model accurately represents the true distribution of the data. Moreover, WAIC and PSIS are considered full Bayesian criteria as they incorporate all the information encompassed in the parameter’s posterior distribution. This effectively integrates and reports the inherent uncertainty in the predictive accuracy estimates. Predictive accuracy aside, PSIS offers an additional advantage in identifying highly influential data points. To achieve this, the criterion uses a built-in warning system that flags observations that make out-of-sample predictions unreliable. The key intuition is that observations that are relatively unlikely, according to the model, exert more influence and render predictions more unreliable than those relatively expected (McElreath, 2020).
This section presents the results of the Bayesian inference procedures, with particular emphasis in answering the three research questions.
The posterior estimates of the models are loaded in the following manner. file_id() is a user-defined function that identifies the stanfit generated files within a particular directory.
```{r}
# load reference models
for(i in 1:12){
model_nam = paste0( ifelse(i<10, 'model0', 'model'), i)
model_out = file.path( save_dir, 'real_chain')
model_fit = file_id( model_out, model_nam )
assign( model_nam,
rstan::read_stan_csv( file.path( model_out, model_fit ) ) )
}
```This research question evaluates the effectiveness of the Beta-proportion GLLAMM in handling the features of entropy scores by comparing its predictive accuracy to the Normal LMM. Models 1, 4, 7, and 10 are specifically chosen for this comparison because their assumptions exclusively address the features of the scores, without integrating additional covariate information. As detailed in Table 2, Model 1 is a Normal LMM that solely addresses data clustering. Building upon this, Model 4 introduces a robust feature. Conversely, Model 7 is a Beta-proportion GLLAMM that deals with boundedness, measurement error and data clustering, and Model 10 extends this model by incorporating a robust feature.
Figure 35 displays values for the DIC, WAIC, and PSIS. They also include the components dWAIC and dPSIS, highlighting the differences in out-of-sample deviations from the best-fitting model and its associated uncertainty. The associated Table 3 and Table 4 provide similar information, while also reporting the pWAIC and pPSIS values, indicating the penalization received by the models for their complexity (roughly associated with their number of parameters). Lastly, the tables show the weight of evidence, which summarizes the relative support for each model.
Overall, all criteria consistently point to Model 10 as the most plausible choice for the data. The model exhibits the lowest values for both WAIC and PSIS, establishing itself as the model with the least deviation from perfect predictive accuracy among those under comparison. Additionally, Figure 35 visually demonstrates the non-overlapping uncertainty (horizontal blue lines) in both dWAIC and dPSIS values for Models 1, 4, and 7 when compared to Model 10. This indicates that Model 10 significantly deviates the least from perfect predictive accuracy when compared to the rest of the models. Lastly, the weight of evidence in Table 3 and Table 4 underscores that 100\% of the evidence aligns with and supports Model 10.
set.seed(12345)
RQ1_WAIC = compare( func=WAIC,
model01, model04,
model07, model10 )
RQ1_WAIC = cbind( DIC=with(RQ1_WAIC, WAIC-2*pWAIC),
RQ1_WAIC )set.seed(12345)
RQ1_PSIS = compare( func=PSIS,
model01, model04,
model07, model10 )
RQ1_PSIS = cbind( DIC=with(RQ1_PSIS, PSIS-2*pPSIS),
RQ1_PSIS )par(mfrow=c(1,2))
plot_compare( compare_obj=RQ1_WAIC,
ns=1, m='WAIC', dm=T )
plot_compare( compare_obj=RQ1_PSIS,
ns=1, m='PSIS', dm=T )
par(mfrow=c(1,1))RQ1_WAIC$Model = as.integer( str_sub( rownames(RQ1_WAIC), start=-2) )
kable( round( RQ1_WAIC[,c(8,1:7)], 3), row.names=F, align='crrrrrrr',
escape=F, digits=2 )| Model | DIC | WAIC | SE | dWAIC | dSE | pWAIC | weight |
|---|---|---|---|---|---|---|---|
| 10 | -9741.66 | -9630.63 | 276.64 | 0.00 | NA | 55.52 | 1 |
| 7 | -9649.54 | -9586.00 | 274.50 | 44.63 | 17.89 | 31.77 | 0 |
| 4 | -2670.62 | -2024.84 | 127.02 | 7605.78 | 263.22 | 322.89 | 0 |
| 1 | -2278.68 | -1761.10 | 101.80 | 7869.53 | 266.54 | 258.79 | 0 |
RQ1_PSIS$Model = as.integer( str_sub( rownames(RQ1_PSIS), start=-2) )
kable( round( RQ1_PSIS[,c(8,1:7)], 3), row.names=F, align='crrrrrrr',
escape=F, digits=2 )| Model | DIC | PSIS | SE | dPSIS | dSE | pPSIS | weight |
|---|---|---|---|---|---|---|---|
| 10 | -9741.66 | -9629.27 | 276.74 | 0.00 | NA | 56.19 | 1 |
| 7 | -9649.54 | -9585.92 | 274.56 | 43.36 | 17.67 | 31.81 | 0 |
| 4 | -2670.62 | -2007.66 | 128.57 | 7621.61 | 263.60 | 331.48 | 0 |
| 1 | -2278.68 | -1753.71 | 102.09 | 7875.57 | 266.54 | 262.48 | 0 |
Upon closer examination, the reasons behind the observed disparities in the models become more apparent. Specifically, Figure 36 highlights that the Normal LMM, as outlined in Model 4, fails to capture the underlying data patterns, resulting in predictions that are physically inconsistent, falling outside the outcome’s range between zero and one. Further insight into this issue is provided by Figure 37 and Figure 39. Figure 37 displays Model 4’s score prediction densities which bear no resemblance to the actual data densities. Furthermore, the top two panels in Figure 39 reveal that misspecification in the Normal LMM causes the model to be more surprised by ‘extreme’ entropy scores, leading to their identification as highly unlikely and influential observations. Consequently, the model is rendered unreliable due to the potential biases present in the parameter estimates. In contrast, the Beta-proportion GLLAMM appears to effectively capture the data patterns, generating predictions within the expected data range. This is evident in Figure 36 and complemented by Figure 38 and Figure 39. In Figure 38, Model 10 display prediction densities that bear more resemblance to the actual data densities. Furthermore, the bottom two panels in Figure 39 show the model is less surprised by ‘extreme’ scores, fostering more trust in the model’s estimates.
plot_speaker(d=data_H,
stanfit_obj1=model04,
stanfit_obj2=model10,
p=0.95,
decreasing=F,
leg=c('model 04','model 10'))col_string = rep( rethink_palette[2], 2)
pred_speaker_pairs(speakers=c(20,8,11, 25,30,6),
d=data_H, stanfit_obj=model04,
p=0.95, nbins=20, col_string=col_string)col_string = rep( rethink_palette[1], 2)
pred_speaker_pairs(speakers=c(20,8,11, 25,30,6),
d=data_H, stanfit_obj=model10,
p=0.95, nbins=20, col_string=col_string)par(mfrow=c(2,2))
plot_outlier(d=data_H, stanfit_obj=model01)
plot_outlier(d=data_H, stanfit_obj=model04)
plot_outlier(d=data_H, stanfit_obj=model07)
plot_outlier(d=data_H, stanfit_obj=model10)
par(mfrow=c(1,1))The second research question aimed to demonstrate the application of the Beta-proportion GLLAMM in estimating the latent potential intelligibility of speakers. This was achieved by employing the general mathematical formalism outlined in Equation 9, along with additional specifications provided in Table 2. The Bayesian procedure successfully estimated the latent potential intelligibility of speakers under Model 10 through the structural equation:
\begin{align} SI_{i} = \alpha + e_{i} + u_{i} \end{align} \tag{28}
Moreover, due to its implementation under Bayesian procedures, Model 10 provides the complete posterior distribution of the speakers’ potential intelligibility scores. This provision, in turn, (1) enables the calculation of summaries, facilitating the ranking of individuals, and (2) supports the assessment of differences among selected speakers. In both cases, the model considers the inherent uncertainty of the estimates resulting from its measurement using multiple entropy scores.
Table 5 and Figure 40 display the ranking of speakers in decreasing order based on point estimates of the latent potential intelligibility. These estimates are accompanied by their associated 95\% highest probability density intervals (HPDI). Both the table and figure clearly indicate that speaker 6 stands out as the least intelligible in the sample, followed farther behind by speaker 1, 17 and 9. In contrast, the figure highlights speaker 20 as the most intelligible, closely followed by speakers 23, 31 and 3. Conversely, Table 6 and Figure 41 show summaries and the full posterior distribution for the comparison of potential intelligibility among selected speakers. The table and figure reveal that only the differences between speakers 6, 1, 17, and 9, along with the difference between speakers 20 and 3 are statistically significant, as their associated 95\% HPDI did not overlap with zero (shaded area).
SI = pred_SI(d=data_H, stanfit_obj=model10, p=0.95)
SI = SI[order(SI$mean, decreasing=T), ]
SI = round(SI, 3)
SI$HPDI = paste0('[', SI$HPDI_lower, ', ', SI$HPDI_upper,']')kable( SI[,c(1:5,11)], align='rrrrrr', row.names=F, digits=3,
col.names=c('Speaker ID','Hearing status',
'Chron. Age','Chron. Age (centered)',
'Posterior mean','95% HPDI') )| Speaker ID | Hearing status | Chron. Age | Chron. Age (centered) | Posterior mean | 95% HPDI |
|---|---|---|---|---|---|
| 20 | 1 | 97 | 29 | 1.692 | [1.079, 2.294] |
| 23 | 1 | 94 | 26 | 1.185 | [0.595, 1.758] |
| 31 | 2 | 104 | 36 | 1.016 | [0.464, 1.576] |
| 3 | 1 | 93 | 25 | 0.958 | [0.42, 1.489] |
| 22 | 1 | 98 | 30 | 0.920 | [0.377, 1.454] |
| 25 | 2 | 93 | 25 | 0.919 | [0.378, 1.497] |
| 16 | 1 | 78 | 10 | 0.900 | [0.341, 1.498] |
| 21 | 1 | 88 | 20 | 0.869 | [0.322, 1.44] |
| 27 | 1 | 93 | 25 | 0.776 | [0.225, 1.303] |
| 5 | 1 | 82 | 14 | 0.539 | [0.003, 1.031] |
| 24 | 1 | 82 | 14 | 0.518 | [0.004, 1.018] |
| 14 | 2 | 85 | 17 | 0.397 | [-0.14, 0.894] |
| 18 | 1 | 86 | 18 | 0.256 | [-0.229, 0.768] |
| 15 | 2 | 85 | 17 | 0.225 | [-0.283, 0.74] |
| 26 | 2 | 85 | 17 | 0.201 | [-0.302, 0.693] |
| 10 | 1 | 82 | 14 | 0.047 | [-0.424, 0.529] |
| 13 | 2 | 85 | 17 | -0.012 | [-0.496, 0.47] |
| 4 | 1 | 80 | 12 | -0.051 | [-0.535, 0.432] |
| 28 | 2 | 93 | 25 | -0.060 | [-0.552, 0.42] |
| 32 | 1 | 78 | 10 | -0.066 | [-0.551, 0.397] |
| 7 | 2 | 104 | 36 | -0.069 | [-0.576, 0.416] |
| 30 | 2 | 84 | 16 | -0.131 | [-0.6, 0.367] |
| 12 | 1 | 82 | 14 | -0.264 | [-0.712, 0.189] |
| 2 | 2 | 68 | 0 | -0.265 | [-0.744, 0.216] |
| 8 | 1 | 86 | 18 | -0.341 | [-0.834, 0.114] |
| 19 | 2 | 86 | 18 | -0.398 | [-0.862, 0.075] |
| 11 | 1 | 82 | 14 | -0.550 | [-1.009, -0.12] |
| 29 | 2 | 80 | 12 | -0.558 | [-1.032, -0.129] |
| 9 | 2 | 83 | 15 | -0.912 | [-1.348, -0.489] |
| 17 | 2 | 76 | 8 | -0.916 | [-1.353, -0.494] |
| 1 | 2 | 85 | 17 | -0.920 | [-1.352, -0.477] |
| 6 | 2 | 85 | 17 | -2.122 | [-2.601, -1.652] |
plot_SI(d=data_H, stanfit_obj=model10,
p=0.95, decreasing=T)SI = contrast_SI(d=data_H, stanfit_obj=model10,
speakers=c(6,20), p=0.95, raw=T)
SI_contr = SI$SI_contr
SI_contr = round(SI_contr, 3)
SI_contr$names = rownames(SI_contr)
SI_contr$HPDI = paste0('[', SI_contr$HPDI_lower, ', ', SI_contr$HPDI_upper,']')idx_comp = c(1,21,13,52,60,6)
kable( SI_contr[idx_comp,c('names','mean','HPDI')], row.names=F, align='ccc',
col.names=c('Contrasts','Posterior mean','95% HPDI') )| Contrasts | Posterior mean | 95% HPDI |
|---|---|---|
| speaker06-speaker01 | -1.202 | [-1.626, -0.724] |
| speaker17-speaker06 | 1.206 | [0.762, 1.656] |
| speaker09-speaker06 | 1.211 | [0.76, 1.634] |
| speaker23-speaker20 | -0.507 | [-1.186, 0.242] |
| speaker31-speaker20 | -0.676 | [-1.351, 0.013] |
| speaker20-speaker03 | 0.734 | [0.067, 1.371] |
require(rethinking)
par(mfrow=c(2,3))
for(i in idx_comp){
dens( SI$SI_raw[[i]], xlim=c(-2.5,2.5),
col=rgb(0,0,0,0.7), show.HPDI=0.95,
xlab='Difference in potential intelligibility')
abline( v=0, lty=2, col=rgb(0,0,0,0.3))
mtext( text=names(SI$SI_raw)[i], side=3, adj=0, cex=1.1)
}
par(mfrow=c(1,1))This research question illustrates how theories on intelligibility can be examined within the model’s framework. Specifically, the focus centers on assessing the influence of speaker-related factors on intelligibility, such as chronological age and hearing status. Notably, despite RQ1 indicating the suitability of Beta-proportion GLLAMM models for entropy scores, existing statistical literature suggests that, in certain scenarios, models incorporating covariate adjustment exhibit robustness to misspecification in the functional form linking an outcome and covariates, commonly referred to as covariate-outcome relationship (Tackney et al., 2023). Consequently, this study compares all models detailed in Table 2. These models are characterized by different covariate adjustments on entropy scores or the latent potential intelligibility of speakers, namely chronological age and hearing status, while potentially exhibiting misspecification in the covariate-outcome relationship, as observed in the case of the Normal LMM.
Similar to RQ1, all criteria consistently identify the Beta-proportion GLLAMM outlined in models 11, 12 and 10 as the most plausible models for the data. The models exhibit the lowest values for both WAIC and PSIS, establishing them as the least deviating models among those under comparison. Moreover, Figure 42 depicts with horizontal blue lines the non-overlapping uncertainty for the models’ dWAIC and dPSIS values. This reveals that, when compared to Model 11, most models exhibit significantly distinct predictive capabilities. Models 12 and 10, however, stand out as exceptions to this pattern. This observation suggests that Models 11, 12, and 10 display the least deviation from perfect predictive accuracy in contrast to the other models. Lastly, the weight of evidence in Table 7 and tbl-rq3-psis, underscores that Model 11 accumulated the greatest support, followed by Model 12, and lastly, by Model 10.
set.seed(12345)
RQ3_WAIC = compare( func=WAIC,
model01, model02, model03,
model04, model05, model06,
model07, model08, model09,
model10, model11, model12 )
RQ3_WAIC = cbind( DIC=with(RQ3_WAIC, WAIC-2*pWAIC),
RQ3_WAIC )set.seed(12345)
RQ3_PSIS = compare( func=PSIS,
model01, model02, model03,
model04, model05, model06,
model07, model08, model09,
model10, model11, model12 )
RQ3_PSIS = cbind( DIC=with(RQ3_PSIS, PSIS-2*pPSIS),
RQ3_PSIS )par(mfrow=c(1,2))
plot_compare( compare_obj=RQ3_WAIC,
ns=1, m='WAIC', dm=T )
plot_compare( compare_obj=RQ3_PSIS,
ns=1, m='PSIS', dm=T )
par(mfrow=c(1,1))RQ3_WAIC$Model = as.integer( str_sub( rownames(RQ3_WAIC), start=-2) )
kable( round( RQ3_WAIC[,c(8,1:7)], 3), row.names=F, align='crrrrrrr',
escape=F, digits=2 )| Model | DIC | WAIC | SE | dWAIC | dSE | pWAIC | weight |
|---|---|---|---|---|---|---|---|
| 11 | -9741.51 | -9632.24 | 276.80 | 0.00 | NA | 54.63 | 0.46 |
| 12 | -9741.49 | -9631.66 | 276.82 | 0.58 | 1.00 | 54.91 | 0.34 |
| 10 | -9741.66 | -9630.63 | 276.64 | 1.61 | 2.97 | 55.52 | 0.20 |
| 9 | -9649.15 | -9586.67 | 274.35 | 45.56 | 18.01 | 31.24 | 0.00 |
| 8 | -9649.05 | -9586.41 | 274.33 | 45.83 | 18.01 | 31.32 | 0.00 |
| 7 | -9649.54 | -9586.00 | 274.50 | 46.24 | 18.19 | 31.77 | 0.00 |
| 6 | -2669.28 | -2027.11 | 126.86 | 7605.13 | 263.15 | 321.08 | 0.00 |
| 4 | -2670.62 | -2024.84 | 127.02 | 7607.40 | 263.22 | 322.89 | 0.00 |
| 5 | -2669.28 | -2024.58 | 127.06 | 7607.66 | 263.24 | 322.35 | 0.00 |
| 3 | -2279.58 | -1762.08 | 101.79 | 7870.16 | 266.68 | 258.75 | 0.00 |
| 1 | -2278.68 | -1761.10 | 101.80 | 7871.14 | 266.64 | 258.79 | 0.00 |
| 2 | -2279.35 | -1760.36 | 101.86 | 7871.88 | 266.69 | 259.49 | 0.00 |
RQ3_PSIS$Model = as.integer( str_sub( rownames(RQ3_PSIS), start=-2) )
kable( round( RQ3_PSIS[,c(8,1:7)], 3), row.names=F, align='crrrrrrr',
escape=F, digits=2 )| Model | DIC | PSIS | SE | dPSIS | dSE | pPSIS | weight |
|---|---|---|---|---|---|---|---|
| 11 | -9741.51 | -9631.16 | 276.88 | 0.00 | NA | 55.17 | 0.46 |
| 12 | -9741.49 | -9630.70 | 276.90 | 0.47 | 1.01 | 55.39 | 0.36 |
| 10 | -9741.66 | -9629.27 | 276.74 | 1.89 | 2.84 | 56.19 | 0.18 |
| 9 | -9649.15 | -9586.58 | 274.41 | 44.58 | 17.91 | 31.28 | 0.00 |
| 8 | -9649.05 | -9586.33 | 274.39 | 44.83 | 17.91 | 31.36 | 0.00 |
| 7 | -9649.54 | -9585.92 | 274.56 | 45.24 | 18.10 | 31.81 | 0.00 |
| 6 | -2669.28 | -2009.22 | 128.46 | 7621.94 | 263.52 | 330.03 | 0.00 |
| 4 | -2670.62 | -2007.66 | 128.57 | 7623.50 | 263.60 | 331.48 | 0.00 |
| 5 | -2669.28 | -2006.49 | 128.71 | 7624.67 | 263.62 | 331.39 | 0.00 |
| 3 | -2279.58 | -1754.43 | 102.07 | 7876.73 | 266.68 | 262.57 | 0.00 |
| 1 | -2278.68 | -1753.71 | 102.09 | 7877.46 | 266.64 | 262.48 | 0.00 |
| 2 | -2279.35 | -1752.86 | 102.13 | 7878.30 | 266.68 | 263.24 | 0.00 |
A closer examination of two models within this comparison set reveal the reasons behind the largest observed disparities. The Normal LMM, as outlined in Model 6, continues to face challenges in capturing underlying data patterns, resulting in predictions that are physically inconsistent, falling outside the outcome’s range. Additionally, the model persists in identifying highly unlikely and influential observations, making it inherently unreliable. In contrast, the Beta-proportion GLLAMM described by Model 12 appears to be less susceptible to ‘extreme’ scores, effectively capturing data patterns within the expected data range and thereby instilling greater confidence in the reliability of the model’s estimates. This contrast is visually depicted in Figure 43, Figure 44, Figure 45, and Figure 46.
plot_speaker(d=data_H,
stanfit_obj1=model06,
stanfit_obj2=model12,
p=0.95,
decreasing=F,
leg=c('model 06','model 12'))col_string = rep( rethink_palette[2], 2)
pred_speaker_pairs(speakers=c(20,8,11, 25,30,6),
d=data_H, stanfit_obj=model06,
p=0.95, nbins=20, col_string=col_string)col_string = rep( rethink_palette[1], 2)
pred_speaker_pairs(speakers=c(20,8,11, 25,30,6),
d=data_H, stanfit_obj=model12,
p=0.95, nbins=20, col_string=col_string)par(mfrow=c(2,2))
plot_outlier(d=data_H, stanfit_obj=model05)
plot_outlier(d=data_H, stanfit_obj=model06)
plot_outlier(d=data_H, stanfit_obj=model11)
plot_outlier(d=data_H, stanfit_obj=model12)
par(mfrow=c(1,1))Considering the results in Table 7, Table 8, and Figure 42, the model comparisons favor three distinct models: Model 10, 11 and 12. Model 10, supported by 20.4\% of the evidence, estimates a single intercept \alpha and no slope to explain the potential intelligibility of speakers (Table 9). In contrast, supported by 45.1\% of the evidence, Model 11 estimates distinct intercepts for each hearing status group, namely \alpha_{HS[1]} for NH speakers and \alpha_{HS[2]} for the HI/CI counterparts, while maintaining a single slope that gauges the impact of age on potential intelligibility estimates. The 95\% HPDI for the comparison of intercepts \alpha_{HS[2]}-\alpha_{HS[1]} reveal significant differences between NH and HI/CI speakers (Table 10). Lastly, with evidence of 34.1\%, Model 12 estimates one intercept and slope per hearing status group, namely \alpha_{HS[1]} and \beta_{A,HS[1]} for the NH speakers, and \alpha_{HS[2]} and \beta_{A,HS[2]} for the HI/CI counterparts. The 95\% HPDI for the comparison of intercepts and slopes reveal significant differences solely in the slopes between NH and their HI/CI counterparts (\beta_{A,HS[2]}-\beta_{A,HS[1]}, see Table 11).
However, a discerning reader can notice that these models yield conflicting conclusions regarding the influence of chronological age and hearing status on intelligibility. Model 10 implies no influence of chronological age and hearing status on the potential intelligibility of speakers. A visual inspection of Figure 47, however, reveals the reason for the model’s low support. Model 10 fails to capture the prevalent increasing age pattern observed in potential intelligibility estimates. In contrast, Model 11 identifies significant differences in potential intelligibility between NH and HI/CI speakers. The model further suggests that with the progression of chronological age, HI/CI speakers lag behind in intelligibility development, with no opportunity to catch up to their NH counterparts within the analyzed age range, as depicted in Figure 48. Finally, Model 12 indicates no significant differences in intelligibility between NH and HI/CI speakers at 68 months of age (around 6 years old). However, the model reveals distinct evolution patterns of intelligibility per unit of chronological age between different hearing status groups, with HI/CI speakers displaying a slower rate of development compared to their NH counterparts within the analyzed age range. The latter is evident in Figure 49.
par_int = c('a','aHS[1]','aHS[2]','bAm','bAmHS[1]','bAmHS[2]')
model_par = par_recovery(stanfit_obj=model10,
p=0.95, est_par=par_int)
vars_int = c('mean','HPDI_lower','HPDI_upper')
model_tab = model_par[,vars_int]
model_tab = round(model_tab, 2)
model_tab$names = c('$\\alpha$')
model_tab$HPDI = paste0('[', model_tab$HPDI_lower, ', ', model_tab$HPDI_upper,']')vars_tab = c('names','mean','HPDI')
opts = options(knitr.kable.NA="")
kable( model_tab[,vars_tab], row.names=F, align='ccc',
escape=F, digits=3,
col.names=c('Parameter','Posterior mean','95% HPDI') )| Parameter | Posterior mean | 95% HPDI |
|---|---|---|
| \alpha | 0.01 | [-0.09, 0.1] |
pred_intel(d=data_H, stanfit_obj=model10,
p=0.95, ns=500, seed=12345)par_int = c('a','aHS[1]','aHS[2]','bAm','bAmHS[1]','bAmHS[2]')
model_par = par_recovery(stanfit_obj=model11,
p=0.95, est_par=par_int)
contrs = contrast_intel(stanfit_obj=model11, p=0.95,
rope=c(-0.05,0.05))
vars_int = c('mean','HPDI_lower','HPDI_upper')
model_tab = rbind(model_par[,vars_int], rep(NA,3), rep(NA,3), contrs[,vars_int])
model_tab = round(model_tab, 2)
model_tab$names = c('$\\alpha$','$\\alpha_{HS[1]}$','$\\alpha_{HS[2]}$',
'$\\beta_{A}$', NA,
'Contrasts','$\\alpha_{HS[2]} - \\alpha_{HS[1]}$')
model_tab$HPDI = paste0('[', model_tab$HPDI_lower, ', ', model_tab$HPDI_upper,']')
model_tab[5, 1:5] = NA
model_tab[6, c(1:3,5)] = NAvars_tab = c('names','mean','HPDI')
opts = options(knitr.kable.NA = "")
kable( model_tab[,vars_tab], row.names=F, align='ccc',
escape=F, digits=3,
col.names=c('Parameter','Posterior mean','95% HPDI') )| Parameter | Posterior mean | 95% HPDI |
|---|---|---|
| \alpha | 0.01 | [-0.08, 0.11] |
| \alpha_{HS[1]} | 0.53 | [0.11, 0.94] |
| \alpha_{HS[2]} | -0.03 | [-0.43, 0.39] |
| \beta_{A} | 0.07 | [0.05, 0.1] |
| Contrasts | ||
| \alpha_{HS[2]} - \alpha_{HS[1]} | -0.55 | [-1, -0.15] |
pred_intel(d=data_H, stanfit_obj=model11,
p=0.95, ns=500, seed=12345)par_int = c('a','aHS[1]','aHS[2]','bAm','bAmHS[1]','bAmHS[2]')
model_par = par_recovery(stanfit_obj=model12,
p=0.95, est_par=par_int)
contrs = contrast_intel(stanfit_obj=model12, p=0.95,
rope=c(-0.05,0.05))
vars_int = c('mean','HPDI_lower','HPDI_upper')
model_tab = rbind(model_par[,vars_int], rep(NA,3), rep(NA,3), contrs[,vars_int])
model_tab = round(model_tab, 2)
model_tab$names = c('$\\alpha$','$\\alpha_{HS[1]}$','$\\alpha_{HS[2]}$',
'$\\beta_{A,HS[1]}$','$\\beta_{A,HS[2]}$', NA,
'Contrasts','$\\alpha_{HS[2]} - \\alpha_{HS[1]}$',
'$\\beta_{A,HS[2]} - \\beta_{A,HS[1]}$')
model_tab$HPDI = paste0('[', model_tab$HPDI_lower, ', ', model_tab$HPDI_upper,']')
model_tab[6, 1:5] = NA
model_tab[7, c(1:3,5)] = NAvars_tab = c('names','mean','HPDI')
opts = options(knitr.kable.NA = "")
kable( model_tab[,vars_tab], row.names=F, align='ccc',
escape=F, digits=3,
col.names=c('Parameter','Posterior mean','95% HPDI') )| Parameter | Posterior mean | 95% HPDI |
|---|---|---|
| \alpha | 0.01 | [-0.09, 0.11] |
| \alpha_{HS[1]} | 0.21 | [-0.28, 0.72] |
| \alpha_{HS[2]} | 0.23 | [-0.24, 0.69] |
| \beta_{A,HS[1]} | 0.10 | [0.07, 0.13] |
| \beta_{A,HS[2]} | 0.06 | [0.03, 0.09] |
| Contrasts | ||
| \alpha_{HS[2]} - \alpha_{HS[1]} | 0.01 | [-0.61, 0.74] |
| \beta_{A,HS[2]} - \beta_{A,HS[1]} | -0.04 | [-0.08, 0] |
pred_intel(d=data_H, stanfit_obj=model12,
p=0.95, ns=500, seed=12345)Given the considerable number of fitted models and the resulting abundance of parameters, this section opted to exclusively showcase the quality and information embedded in the Bayesian chains through models 6 and 12. The selection of these models is grounded in their parameter counts, with both registering the highest among those detailed in Section 7.3. It is crucial to underscore that a meticulous examination of all fitted models was conducted. Notably, all models demonstrated comparable results to those specifically chosen for illustrative purposes.
In general, both graphical analysis and diagnostic statistics indicated that all chains exhibited low to moderate autocorrelation, explored the parameter space in a seemingly random manner, and converged to a constant mean and variance in their post-warm-up phase. Figure 50 visualizes the \widehat{\text{R}} diagnostic statistic and Figure 51 through Figure 63 illustrate the chain’s graphical analysis.
par_int = c( 'a','aHS[1]','aHS[2]',
'bAm','bAmHS[1]','bAmHS[2]',
'm_b','s_b',
'm_i','s_i',
'm_u','s_u',
'r_s','r_M',
's_w','Mw')
model06_parameters = par_recovery(
stanfit_obj = model06,
est_par = par_int,
p = 0.95 )
model12_parameters = par_recovery(
stanfit_obj = model12,
est_par = par_int,
p = 0.95 )par( mfrow=c(1,2) )
plot( 1:nrow(model06_parameters), model06_parameters$Rhat4,
ylim=c(0.95, 1.1), pch=19, col=rgb(0,0,0,alpha=0.3),
xaxt='n',xlab='', ylab='Rhat',
main='Normal LMM: model 06')
axis( side=1, at=1:nrow(model06_parameters),
labels=rownames(model06_parameters),
cex.axis=0.8, las=2 )
abline( h=1.05, lty=2, col=rgb(0,0,0,0.3) )
plot( 1:nrow(model12_parameters), model12_parameters$Rhat4,
ylim=c(0.95, 1.1), pch=19, col=rgb(0,0,0,alpha=0.3),
xaxt='n',xlab='', ylab='Rhat',
main='Beta-proportion GLLAMM: model 12')
axis( side=1, at=1:nrow(model12_parameters),
labels=rownames(model12_parameters),
cex.axis=0.8, las=2 )
abline( h=1.05, lty=2, col=rgb(0,0,0,0.3) )
par( mfrow=c(1,1) )tri_plot( stan_object=model06,
pars=c('aHS[1]','aHS[2]') )tri_plot( stan_object=model12,
pars=c('a','aHS[1]','aHS[2]') )tri_plot( stan_object=model06,
pars=c('bAmHS[1]','bAmHS[2]') )tri_plot( stan_object=model12,
pars=c('bAmHS[1]','bAmHS[2]') )tri_plot( stan_object=model06,
pars=c('m_b','s_b') )tri_plot( stan_object=model12,
pars=c('m_b','s_b') )tri_plot( stan_object=model06,
pars=c('m_i','s_i') )tri_plot( stan_object=model12,
pars=c('m_i','s_i') )tri_plot( stan_object=model06,
pars=c('m_u','s_u') )tri_plot( stan_object=model12,
pars=c('m_u','s_u') )tri_plot( stan_object=model06,
pars=c('r_s', paste0('s_w[', 1:4,']')) )tri_plot( stan_object=model12,
pars=c('r_M', paste0('Mw[', 1:4,']')) )tri_plot( stan_object=model12,
pars=paste0('SI[', 1:5, ']') )Moreover, the density plots and n_{\text{eff}} statistics collectively confirmed that all posterior distributions are unimodal distributions with values centered around a mean, generated with a satisfactory number of uncorrelated sampling points, making substantive sense compared to the models’ prior beliefs. Figure 64 visualizes the n_{\text{eff}} diagnostic statistic and Figure 66 through Figure 71 illustrate the chains’ graphical analysis.
par( mfrow=c(1,2) )
plot( 1:nrow(model06_parameters), model06_parameters$n_eff,
ylim=c(0, 18000), pch=19, col=rgb(0,0,0,alpha=0.3),
xaxt='n',xlab='', ylab='Neff',
main='Normal LMM: model 06')
axis( side=1, at=1:nrow(model06_parameters),
labels=rownames(model06_parameters),
cex.axis=0.8, las=2 )
abline( h=seq(0, 18000, by=2000), lty=2, col=rgb(0,0,0,0.3) )
plot( 1:nrow(model12_parameters), model12_parameters$n_eff,
ylim=c(0, 18000), pch=19, col=rgb(0,0,0,alpha=0.3),
xaxt='n',xlab='', ylab='Neff',
main='Beta-proportion GLLAMM: model 12')
axis( side=1, at=1:nrow(model12_parameters),
labels=rownames(model12_parameters),
cex.axis=0.8, las=2 )
abline( h=seq(0, 18000, by=2000), lty=2, col=rgb(0,0,0,0.3) )
par( mfrow=c(1,1) )par_int = c('a','aHS','bAm','bAmHS')
dens_plot(stanfit_obj=model06, pars=par_int, p=0.95)par_int = c('a','aHS','bAm','bAmHS')
dens_plot(stanfit_obj=model12, pars=par_int, p=0.95)par_int = c('m_b','m_i','m_u','s_b','s_i','s_u')
dens_plot(stanfit_obj=model06, pars=par_int, p=0.95)par_int = c('m_b','m_i','m_u','s_b','s_i','s_u')
dens_plot(stanfit_obj=model12, pars=par_int, p=0.95)par_int = c('r_s','s_w')
dens_plot(stanfit_obj=model06, pars=par_int, p=0.95)par_int = c('r_M','Mw')
dens_plot(stanfit_obj=model12, pars=par_int, p=0.95)par_int = 'SI'
dens_plot(stanfit_obj=model12, pars=par_int, p=0.95)This study examined the suitability of the Bayesian Beta-proportion GLLAMM for the quantitative measuring and testing of research theories related to speech intelligibility using entropy scores. The initial findings supported the assertion that Beta-proportion GLLAMMs consistently outperformed Normal LMMs in predicting entropy scores, underscoring its superior predictive performance. The results emphasized that models neglecting the outcomes’ measurement error and boundedness lead to underfitting and misspecification issues, even when robust features are integrated. This is clearly illustrated by the Normal LMMs.
Secondly, the study showcased the Beta-proportion GLLAMM’s proficiency in estimating the latent potential intelligibility of speakers based on manifest entropy scores. Implemented under Bayesian procedures, the proposed model offered a valuable advantage over frequentist methods by further providing the full posterior distribution of the speakers’ potential intelligibility. This provision facilitated the calculation of summaries, aiding individual rankings, and supported the comparisons among selected speakers. In both scenarios, the proposed model accounted for the inherent uncertainty in the intelligibility estimates.
Thirdly, the study illustrated how the proposed model assessed the impact of speaker-related factors on potential intelligibility. The results suggested that multiple models were plausible for the observed entropy scores, indicating that different speaker-related factor theories were viable for the data, with some presenting contradictory conclusions about the influence of those factors on intelligibility. However, even when unequivocal support for one theory was not possible, the divided support among these models informed that certain statistical issues may be hindering the model’s ability to distinguish among individuals and, ultimately, among models. These issues encompassed the insufficient sample size of speakers, the inadequate representation of the population of speakers, and the imprecise measurement of the latent variable of interest.
Ultimately, this study introduced researchers to innovative statistical tools that enhanced existing research models. These tools not only assessed the predictability of empirical phenomena but also quantitatively measured the latent trait of interest, namely potential intelligibility, facilitating the comparison of research theories related to this trait. However, the presented tools introduce new challenges for researchers seeking their implementation. These challenges emerge from two distinct aspects: one methodological and the other practical. In the methodological domain, researchers need familiarity with Bayesian methods and the principled formulation of assumptions regarding the data-generating processes and research inquiries. This entails understanding and addressing each of the data and research challenges within the context of a statistical (probabilistic) model. Conversely, in the practical domain, researchers need familiarity with probabilistic programming languages (PPLs), which are designed for specifying and obtaining inferences from probabilistic models -the core of Bayesian methods. To ensure the successful utilization of this new statistical tool, this study addresses both challenges by providing comprehensive, step-by-step guidance in the form of this digital walk-through document.
This study provides valuable insights into the use of a novel approach to simultaneously address the different data features of entropy scores in speech intelligibility research. However, it is important to acknowledge the limitations of this study and explore potential avenues for future research.
Firstly, the study interprets potential intelligibility as an unobserved latent trait of speakers influencing the likelihood of observing a set of entropy scores. These scores, in turn, reflect the transcribers’ ability to decode words in sentences produced by the same speakers. Despite this practical approach, the construct validity of the latent trait heavily depends on the listeners’ appropriate understanding and execution of the transcription task. Construct validity, as defined by Cronbach and Meehl (1955), refers to the extent to which a set of manifest variables accurately represents a concept that cannot be directly measured. Considering the study assumes the transcription task set by Boonen and colleagues (Boonen et al., 2021) was properly understood and executed, it expects that potential intelligibility reflects the overall speech intelligibility of speakers. However, this study does not delve into the general epistemological considerations regarding the connection between the latent variable and the concept.
Secondly, the study identified a notable absence of unequivocal support for one of the compared models. This deficiency may be attributed to factors such as the insufficient sample size of speakers, the inadequate representation of the populations of speakers (referred to as selection bias), and the imprecise measurement of the latent variable. Insufficient sample size and selection bias yield data with limited outcome and covariates ranges, leading to biased and imprecise parameter estimates (Everitt & Skrondal, 2010). Furthermore, these issues, exacerbated by reduced measurement precision, can result in models with diminished statistical power and a higher risk of type I or type II errors (McElreath, 2020). Consequently, future research should consider conducting power analyses for the proposed models. This entails assessing the impact of expanding the speakers’ pool on testing research theories, or increasing the number of speech samples, transcriptions, and listeners to enhance the precision of potential intelligibility estimates. With these insights, future investigations should contemplate increasing the speaker sample with a group that adequately represents the population of interest. However, this must be done while mindful of the pragmatic limitations associated with transcription tasks, specifically considering the costs and time-intensiveness of the procedure.
Thirdly, the study presented an illustrative example for the investigation of research theories within the model’s framework. However, it did not offer an exhaustive evaluation of all factors influencing intelligibility, which are thoroughly explored in the works of Boons et al. (2012), Fagan et al. (2020), Gillis (2018), and Niparko et al. (2010). Consequently, the study cannot discard the presence of unobservable variables that might bias the parameter estimates, potentially impacting the inferences provided. Hence, future research should consider integrating appropriate causal hypotheses about these factors into the proposed models, as proper covariate adjustment facilitates the production of unbiased and precise parameter estimates (Cinelli, Forney, & Pearl, 2022; Deffner, Rohrer, & McElreath, 2022).
Lastly, this study proposes two directions for future exploration in speech intelligibility research. Firstly, there is an opportunity to investigate alternative methods for assessing speech intelligibility beyond transcription tasks and entropy scores. The experimental design of transcription tasks imply that the procedure may be time-intensive and costly. Thus, exploring less time-intensive or more cost-effective procedures, that still offer comparable precision in intelligibility estimates, could benefit both researchers and speech therapists alike. An illustrative example of such a method is Comparative Judgment (CJ), where judges compare and score the perceived intensity of a trait between two stimuli (Thurstone, 1927). In the context of the intelligibility trait, the stimuli under assessment could be the speech samples uttered by two speakers. Nevertheless, CJ serve as an ideal example as the method has gained increasing attention within the realm of educational assessment, with several studies providing evidence for its validity in assessing various task within student works, as demonstrated by examples in Pollit (2012a, 2012b), Lesterhuis (2018), van Daal (2020) and Verhavert et al. (2019).
Conversely, a second avenue for exploration involves integrating diverse data types and evaluation methods to assess individuals’ intelligibility. This can be accomplished by leveraging two features of Bayesian methods: their flexibility and the concept of Bayesian updating. Bayesian methods possess the flexibility to simultaneously handle various data types. Additionally, through Bayesian updating, researchers can integrate information from the posterior distribution of parameters as priors in models for subsequent evaluations. Ultimately, this could enable researchers to assess speakers’ intelligibility progress without committing to a specific data type or evaluation method. This advancement could mirror the emergence of second-generation Structural Equation Models proposed by Muthen (Muthén, 2001), where models facilitate the combined estimation of categorical and continuous latent variables. However, in the context of future research, the proposal would facilitate the estimation of latent variables using a combination of data types and evaluation methods, contingent upon the fulfillment of construct validity by those evaluation methods.
This study highlights the effectiveness of the Bayesian Beta-proportion GLLAMM to collectively address several key data features when investigating unobservable and complex traits, using speech intelligibility and entropy scores as an example. The results demonstrate the proposed model consistently outperforms the Normal LMM in predicting the empirical phenomena. Moreover, it exhibits the ability to quantify the latent potential intelligibility of speakers, allowing for the ranking and comparison of individuals based on the latent trait while accommodating associated uncertainties. Additionally, the proposed model facilitates the exploration of research theories concerning the influence of speaker-related factors on potential intelligibility. The study indicates that integrating and comparing these theories within the model’s framework is a straightforward task. However, the introduction of these innovative statistical tools presents new challenges for researchers seeking implementation. These challenges encompass the principled formulation of assumptions about the data-generating processes and research inquiries, along with the need for familiarity with probabilistic programming languages (PPLs) essential for implementing Bayesian methods. Nevertheless, the study suggests several promising avenues for future research, including power analysis, causal hypothesis formulation, and exploration and integration of novel evaluation methods for assessing intelligibility. The insights derived from this study hold implications for both researchers and data analysts interested in quantitatively measuring and testing theories related to nuanced, unobservable constructs, while also considering the appropriate prediction of the empirical phenomena.
Funding: The project was founded through the Research Fund of the University of Antwerp (BOF).
Conflict of interests: The authors declare no conflict of interest.
Ethics approval: This is an observational study. The University of Antwerp Research Ethics Committee has confirmed that no ethical approval is required.
Consent to participate: Not applicable
Consent for publication: All authors have read and agreed to the published version of the manuscript.
Availability of data and materials: The data is delivered upon request, while the user-defined functions are available in the code folder from this walk-through repository
Code availability: The code is available this walk-through.
Authors’ contributions: Conceptualization: S.G., S.dM., and J.M.R.E; Data curation: J.M.R.E.; Formal Analysis: J.M.R.E.; Funding acquisition: S.G. and S.dM; Investigation: S.G.; Methodology: S.G., S.dM., and J.M.R.E; Project administration: S.G. and S.dM.; Resources: S.G. and S.dM.; Software: J.M.R.E.; Supervision: S.G. and S.dM.; Validation: J.M.R.E.; Visualization: J.M.R.E.; Writing - original draft: J.M.R.E.; Writing - review & editing: S.G. and S.dM.
For a thorough explanation of the Bayesian inferences procedures the reader can refer to Kruschke (2015) or McElreath (2020).↩︎
The reader can refer to Brooks et al. (2011) for a detailed treatment on MCMC methods.↩︎
An interested reader can further refer to McElreath (2020) for a detailed explanation of grid approximation.↩︎
An interested reader can refer to McElreath (2020), Gorinova et al. (2019) and Neal (2003)↩︎